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In this Issue 



One of the most competitive areas in the personal computer market today is the 
race to provide printing solutions to meet the needs of the entire family. The 
specifications for a successful printer in this arena include technologies that 
provide continuous improvement in throughput and print quality, low cost, 
attractive small size, quiet operation, ease of use, and designs that lend them- 
selves to high-volume production. 

Design teams at Hewlett-Packard divisions that are responsible for HP color inkjet 
printers decided to take a phased approach to meet the challenges posed by 
these specifications. The HP DeskJet 820C (page 6) is the first product resulting 
from this evolutionary product plan. The DeskJet 820C contains a writing system, print mechanism, and 
package leveraged from the HP DeskJet 850C and a new electronic, firmware, and software architecture 
called the Printing Performance Architecture (PPA). 

PPA grew out of the recognition that newer generations of personal computers have the bandwidth to 
take on some of the computing tasks typically relegated to the printer, and many software applications 
are rapidly moving away from MS-DOS "' to a Microsoft Windows " environment. With this realization, 
the design teams developed a software, firmware, and digital electronics architecture that uses the 
computing resources of the PC instead of duplicating these resources in the printer. This architecture 
helped to lower the cost of the printer by reducing RAM from 1M bytes to 128K bytes, ROM from 2M 
bytes to 64K bytes, and the gate count of the largest ASIC by 25%. 

With the reduction in the logic-supporting hardware in the DeskJet 820C, printer functions such as swath 
cutting and data formatting were moved into the software driver. The article on page 12 discusses the 
design of the PPA printer software driver, which implements functions traditionally found in the printer, 
handles PPA communication between the host and the printer, and provides PCL emulation for DOS 
application support. 

Because so many printerfunctions are implemented in the host software driver, fewer functions are 
needed in the firmware for the DeskJet 820C. As described in the article on page 22, "Don't touch the 
dots" was the firmware designers' golden rule. This means that firmware in the printer was designed so 
that it is only responsible for taking the formatted data from the host and sending commands to the motor 
and print cartridge to place the dots at the appropriate places on the paper. The printer firmware is also 
responsible for user interface and status functions 

ASIC development for the PPA printer controller and the inkjet printhead drive electronics is described 
in the articles on pages 31 and 38 respectively. A typical digital controller for a printer contains a micro- 
processor to control the printer, RAM for incoming data, ROM for firmware, and custom logic for printer- 
specific functions. For the DeskJet 820C, these functions were integrated on one chip and optimized to 
meet the requirements of PPA. The pen drive electronics are responsible for driving signals to eject the 
ink from the pen and providing a control system to maintain a constant temperature in the active area of 
the pen. For the DeskJet 820C's pen drive electronics, the functions of four ICs were integrated in one 
chip, and all the electronics related to the pens were moved onto the carriage's printed circuit assembly. 

Today, key design decisions associated with developing a microprocessor not only focus on technical 
requirements such as a higher speed, but also on business and marketing requirements. A few years 
ago HP began developing a line of PA-RISC processors to meet the needs of higher-volume and more 
cost-sensitive products. The article on page 43 introduces four articles that describe the latest proces- 
sor in this line, the HP PA 7300LC. The HP PA 7300LC processor is optimized for entry-level to midrange 
high-volume systems such as workstations and servers. 

The PA 7300LC processor is the result of leveraging the superscalar CPU core from the HP PA 7100LC 
processor, adding a large embedded primary cache, and reducing the chip area and pipeline stalls. The 
article on page 48 describes the PA 7300LC microarchitecture, the CPU core, and the memory and I/O 
controller. The leveraging effort, the chip area reduction, and the redundant cache RAM arrays are 
discussed in the article on page 61. 
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No matter how much leveraging is done or how mature the IC fabrication process, functional verification 
of a new chip is always an important step in the process. The article on page 69 describes the processes 
used in the presilicon and postsilicon phases to verify the correctness of the PA 7300LC processor. 

The HP 9000 D-class server (page 73) and the HP 9000 B-class workstation (page 82) are examples of 
products that use the PA 730OLC processor. The D-class server is targeted for the high-volume environ- 
ment of departmental and branch computing. The article includes a comparison between different models 
of D-class servers that use HP processors other than the PA 7300LC. The HP 9000 B-class workstation is 
comparably priced to the HP 9000 Model 715 workstation but has superior performance and I/O capabili- 
ties. The article focuses on how cooperative engineering between the various entities involved in product 
development helped to reduce the time to market for this product. 

Software testing is always one of the most critical phases of the software development process. If test 
planning is late or inadequate, the test effort can cause late, or worse, low-quality products. The level of 
testing and the pass/fail criteria vary with the type of software. For example, software used in video games 
would not be tested the same way or have the same pass/fail criteria as software used in patient monitors. 
The articles beginning on page 89 describe the processes, languages, and tools the authors have devel- 
oped for testing safety-critical software. In this case the safety-critical software involves software used 
in the HP OmniCare patient monitors, which monitor the physiological parameters of critically ill patients. 

The evolution of the software testing process for the HP OmniCare patient monitors and resulting test 
tooling called testware are described in the first article. The next article (page 95) describes a high-level 
programming language called ATP (Automatic Test Processor), which allows the integration of existing 
test processors used for validation. The AutoCheck program (page 103I evaluates test files and documents 
the results of the evaluations The final article (page 109) describes how these test tools can be used to 
help in testing localized software. 



C.L. Leath 
Managing Editor 



Cover 

The cover shows an artistic rendition of the change in the printing model brought on by the Printing 
Performance Architecture (PPA) implemented in the HP DeskJet 820C. The top figure depicts printing 
before the PPA where most of the printing logic resides in the printer. The lower figure depicts printing 
after the PPA where most of the printing logic resides in the host computer. 



What's Ahead 

Featured in our August issue will be' 

• The design and verification of the HP PA 8000 and PA 8200 four-way superscalar CPUs 

• The HP OpenCall family of telecommunications platforms based on intelligent network concepts 

• Software to test policing in ATM networks 

• An ob|ect-oriented database management system for large historical data archives 

• The HP 4500 benchtop inductively coupled plasma mass spectrometer 

• Five papers from the 1996 HP Design Technology Conference. 

Reminder 

Because we aie getting ready for a new Journal design and focusing on other projects, we won't be 
publishing an issue in October 
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A Lower-Cost Inkjet Printer Based on 
a New Printing Performance 
Architecture 

The HP DeskJet 820C printer is the first HP inkjet printer in an 
evolutionary product plan that takes advantage of computer and operating 
system trends to make inkjet printing affordable for more users. The 
printer's integrated software, firmware, and digital electronics 
architecture uses the computational resources in the PC instead of 
duplicating these resources in the printer. 

by David J. Shelley, James T. Majewski, Mark R. Thackray, and John L. Mi-Williams 



The two Hewlett-Packard divisions in Vancouver, Washington 
arc responsible for establishing and maintaining HP color 
inkjet printers as market leading personal products in the 
home and office computing environments. These divisions 
have a ten-year history of successful products starting with 
the original HP DeskJet printer in 1986 and culminating 
most recently with the introduction of the new HP Desk-Jet 
820G (Fig. 1 ) in the spring of 1996. 

Our competitors, of course, have also been introducing 
products, some of which incorporate newly developed tech- 
nologies that strongly challenge the performance, print qual- 
ity, and cost -effectiveness of our own. It is clear that our 
competitors are here for the long term, so we must develop 
long-term strategies to compete with them. 

Aside from competition, we also have before us an excellent 
opportunity to broaden our printing solutions to embrace 
the needs of the entire family, a step well beyond the tradi- 
tional "take work home'' professional who has been our 
mainstay home customer. These new customers have dis- 
tinctly different needs that will require insightful under- 
standing as well as timely incorporation of focused innova- 
tions in our products. 




Fig. l. Ill' DeskJet 820! icolor inkj.-i printer. 



At the beginning of the HP DeskJet 820C project, it was 
clear that our ability lo retain and grow our market leader- 
ship depended heavily upon our ability to deal with these 
two powerful market dynamics. We knew that we had to 
simultaneously slay ahead of the competition and satisfy 
the rapidly increasing breadth of home printing needs. The 
ingredients for long-term success in this endeavor were 
equally clear: 

• Technologies that result in continuously improving print 
throughput and quality 

• Designs that earn adequate profits at reduced customer 
prices 

• Designs that appeal lo home customers by virtue of small 
size, attractive industrial designs, very quiet operation, and 
unparalleled ease of use 

• Designs capable of high-volume production at multiple 
international factory sites 

• The ability to design products lo hil narrow market 
windows. 

We realized that no single product program could success- 
fully satisfy ail of these criteria, -so we needed lo develop a 
phased approach. We decided that each new product devel- 
opment effort should leverage previous capabilities while 
incorporating a small set of new and innovative capabilities 
focused on our customer needs. These new capabilities 
would then be leveraged forward into succeeding efforts. 
In this fashion we could ensure a timely series of product 
introductions, Bach building upon previous successes and 
incrementally providing new capabilities that would ulti- 
mately satisfy all of our strategic initiatives. In addition to 
the market timeliness gained by a phased approach, we also 
knew that this plan would use scarce development resources 
in the most efficient manner. 

Design Objectives 

The HP DeskJet 820C printer is the first product in this evo- 
lutionary product plan. In keeping with our overall strategy, 
the primary objectives of the development program were to: 

• Leverage the speed anil print quality afforded by the new 
writing sysleni developed for the IIP DeskJet 850C 
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Leverage the printing mechanism of lite HP DeskJet 850C 
Innovate by offering this printing capability at a greatly 
reduced price for home customers 
Introduce in the spring of 1996. 

While reduced cost and a spring 1996 intnxluction were 
c learly the primary objectives for the HP DeskJet ,S2(K effon. 
we also decided to begin our journey towards consumer 
design by making industrial design clianges that fit within 
the constraints of a leveraged mechanism and package. 

Since we had decided to leverage the writing system, print 
mechanism, and package, we needed to examine the elec- 
tronic, firmware, and software driver subsystems to find cost 
reduction opportunities. Based on our initial investigation 
we set a program goal to reduce direct material cosl l>\ :?(>"., 

Design Approaches 

< to first design tactic recognized two trends. First, newer 
generations of personal computers have more than enough 
bandwidth lo lake on some of the computing load ihai has 
Ullti] now resided in the primer itself. Second, software ap- 
plications are rapidly moving away from MS-EX IS" and into 
the Microsoft Windows" environment. In view of these 
trends, we decided that the IIP DeskJet N20C printer would 
not support printing from standalone DOS applications. This 
enabled us to develop an Integrated software, firmware, and 
digital electronics architecture that uses the computational 
resources in the PC instead of duplicating these resources in 
the printer. We call the architecture Printing Performance 
Architecture, or PPA. This architectural choice enabled us 
to achieve half of our ."30% cost reduction goal by reducing 
RAM rrom IM bytes to 12HK bytes, ROM from 2M bytes lo 
OIK bytes, anil gale count in our largest \SI( ' by 2~>"<i. At the 
same lime, higher-power I'( Is enabled us lo maintain and in 
many cases improve system throughput 

A second critical design decision was to disallow simulta- 
neous firing of the black and color print cartridges during a 
single print swath. While this strategy ac hieved an additional 
20% of our overall goal, the obvious risk was a reduction in 
throughput for document* thai contain juxtaposed black 
and color. However, we fell that our new system architec- 
ture would militate this risk and slill allow us lo meet our 
performance objectives. This single decision allowed us to 
simplify the drive elect routes for the print Cartridge to the 
point where they could be located on a small, carriage- 
mounted printed circuit assembly rather than on the main 
logic printed circuit assembly. This change, in turn, enabled 
two other very significant cosl reductions. First, the interface 
between the logic and carriage printed circuit assemblies 
was dramatically simplified, allowing the use of standard 
and easily available cables and connectors rather than the 
custom designs that we bad prev iously used. Second, using 
this new pari it inning of analog functions, the design team 
was able to implement the required capability using two 
custom analog ASK s in contrast to the four that bad been 
used in the DeskJet BBOG 

An additional 1096 of OUT cost goal was achieved by capitaliz- 
ing Oil three cost saving opportunities in our power supply. 
First, the initial III' DeskJet 850 power supply was specified 
With signific ant margin to allow flexibility for the newly de- 
veloped Writing System '" thai product. However, the IIP 
DeskJet 8201 ' development team hail the advantage of a 



stable writing system and therefore could specify power 
needs more precisely. Second, we modified the user inter- 
action model with the printer's power functions and were 
able to eliminate some of the complex capabilities that were 
included in die IIP DeskJet 850. Third, we specified our 
power supply at a very high level of abstraction to use the 
design expertise of our vendor base to deliver cost-optimal 
implementations. 

Several sources Contrib ute d to the final 20% of our cost 
reduction goal. Our new system architecture and new parti- 
tioning of analog functionality allowed a significant reduction 
in the size of our printed circuit assemblies. Direct material 
cost savings were realized by elimination of the connectors 
and support components for interconnecting lo Apple PCs. 
Focused design work to cost -optimize our EMI and ESD 
solutions eliminated many discrete electronic components. 

As a result of OUT plan to leverage and our focus on limited 
but meaningful innovation, the HP DeskJet 820C was intro- 
duced lo the market on schedule in the spring of 1996 fol- 
lowing a development effort that exceeded objectives by 
achieving a 3896 direct material cost reduction and actual 
performance nearly twice our initial expectations. The 
techniques responsible for this success have been carried 
forward and are already incorporated into the next prod- 
ucts in our evolutionary process, 

Printing Performance Architecture 

The process of printing a document created on a computer 
involves several steps to transform and prepare the informa- 
tion. In the traditional Windows model used by Inkjet print- 
ers, the primer driver software receives a description of the 
page from the application, transforms that description into 
a mechanism independent format that can be understood by 
the printer, and encodes it into a standard printer language. 
The encoded description is then transferred to the printer. 
The printer decodes the data and formats ii for its particular 
printing mechanism. To encode the Information for transfci 
lo the printer, Hewlett-Packard developed a standard lan- 
guage called Pel. (Printer Control Language). Because of 

the widespread use of IIP printers, this language has become 
a de facto standard. P< I. allows the computer to prepare an 
image for printing without detailed knowledge of the me- 
chanical details of the printer. 

For the Microsoft Windows environment. HP has always 
developed the software drivers for Us inkjel printers. In the 
Windows model, the application sends a page description to 
the driver through the operating system. The description is 
in the form of drawing objects (lines, rectangles, text, etc.). 
The driver then raslerizes the description. Rasterization is 

the process of mapping the page description to an X-V plane 

or bitmap. At this point, the data still must undergo several 
more transformations before it can be used lo print. For ex- 
ample, the first bitmap may be 2 1-bil data at :100 dpi. where- 
as Ihe inkjel mechanism may be 600 dpi and only able lo put 
one of four colors ai each pixel I black, cyan, magenta, yel- 
low) Traditionally, Ihe driver performed some of the needed 
transformations, but left many of the more compute-intensive 
ones to dedicated hardware and firmware in Ihe printer. 

After the driver has performed all of its computations, it 
encodes the information using the subset of pel, needed tot 

bitmapped data. The printer in turn decodes the P< |, and 
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Fig. 2. Traditional P< 'I. printing model. 

performs all ofthe necessary further computations to formal 
Ihe data for the printing mechanism. Manipulations ihe 
printer must perform include, hut are not limited to. some 
color transformations, cutting the data into individual swaths, 
and separating the data into columns ( inkjet cartridges are 
composed Of two columns of staggered nozzles). This 
process is diagrammed in Fig. 2. 

The process in the MS-DOS environment is similar with two 
exceptions. One, the application must perform the rasteriza- 
tion for all graphics and PCL encoding, and two, the printer 
will accept i ion rast erized text, alleviating the need for the 
application to do it. Because of this second difference, pre- 
vious inkjet printers were required to have extensive memory - 
intensive fouls liuilt into them. In addition, the printer had lo 
contain firmware and hardware to rasterize the fonts. Since 
the dala manipulations performed were extensive, they 
required a powerful microprocessor and significant amounts 
of dedicated hardware. 

The concept of the new Printing Performance Architecture, 
or PPA, is to change this model by eliminating some of the 
steps. Because modem personal computers have powerful 
microprocessors and a large amount of system memory, the 
task of data formatting for the print mechanism is moved 
entirely to Ihe host computer. Also, because the data is no 
longer in a PCL-compatible format, PCL is not used to trans- 
mit the data lo the printer. Instead, a very simple proprietary- 
protocol was developed. The protocol is simple enough thai 
the hardware can automatically depacketize the data without 
help from the firmware. The data is then directly used to prim 
Ihe image on the page. This process is diagrammed in Fig. :i. 

Advantages of PPA 

The primary advantages of PPA are cost and performance. A 
PPA printer can deliver performance similar lo a traditional 
non-PPA printer at a reduced cost. Allemai ively. it can deliver 
higher levels of performance at a similar cost. The reasons 
for the cost advantage fall into two areas: less memory is 
required (both RAM and ROM), and a lower-performance 
microprocessor can be used in the printer because the micro- 
processor doesn't have to touch the dala. 

Memory costs are a significant portion of the material cost 
of a low-end printer, A PPA printer requires significantly less 
ROM and RAM. First, the PPA printer doesn't have to store 
any internal fonts. Traditional printers supported both the 
Windows environment and the DOS environment. The print- 
ing model in the DOS environment requires ihe printer to 
store font informal ion. A DOS application sends an ASCII 



code for the desired text character. To prinl that character, 
Ihe printer needs a bitmap for thai character in its ROM. 
In contrast, applications in the Windows environment sentl 
only bitmapped graphic information to the primer, never 
ASCII text. Because the PPA printer is designed exclusively 
for ihe Windows environment, it doesn't need to store the 
fonts in ROM. 

Second, because Ihe printer doesn't do any PCL decoding, 
swath cutting, or data formatting, Ihe printer requires much 
less firmware, again saving ROM. The primary functions of 
the printer firmware are mechanism control, input/output, 
mid the user interface, hi the IIP DeskJet 820C, the firmware 
is stored in only (i-JK bytes of ROM. Because there is so little, 
il was possible to integrate Ihe ROM into the digital ASIC. 
Previous non-PPA primers of similar capability used 512K 
bytes or more of R< )M. 

Finally, because the processor doesn't touch Ihe data and 
doesn't need to create any intermediate forms ofthe image 
data, the printer requires less RAM. The HP DeskJet 820C 
uses a 12SK-byte DRAM. The previous generation, non-PPA 
printer used 512K to 1M bytes of RAM. Because there are 
fewer memory It's, Ihe memory cost for a PPA printer is 
much lower. The reduced number of memory devices also 
reduces the printed circuit board area, again saving cost. 

The second factor in saving cost comes from Ihe need for 
less microprocessor horsepower. In a PPA printer. Ihe pro- 
cessor does not tlo swath cutting and formatting of Ihe data. 
Its primary functions are mechanism control, input/output, 
and the user interface. This requires a less complex antl con- 
sequently less expensive microprocessor. The HP DeskJet 
820C uses a Motorola 68EC000. The 68EC000 can be config- 
ured with either an 8-bit or a Mi-bit data bus. In the HP Desk- 
Jet 820C. the processor is used in S-bit mode. This reduces 
the bus width in Ihe digital custom ASIC, again saving area 
and hence cost. 

Finally, because of the Simplified "lata path in the printer (the 
data path is the path the data takes from the input/output 
poii, through the ASIC, and out to the print cartridge), it was 
possible in the HP DeskJet 820C to design a data path in 
which the processor doesn't touch the image data. A dedi- 
cated hardware data patli is always much faster, albeit less 
flexible, than a data path in which the processor must Irans- 
form or handle the data. A full hardw are dala path is not 
limited to a PPA architecture; but is much easier to accom- 
plish in a PPA printer because of ihe simplified dala path. 
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Fig. 3. New printing Performance 
Architecture (PPA) printing model. 
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Challenges of PPA 

While PPA has some significant advantages, it also brings 
with it some challenges: 

DOS is supported only through Windows and not in a 
siandalone environment 

PPA hosts must be more powerful than hosts for tin 
equivalent non-PPA printer. 

The printer driver requires detailed knowledge of the 
pnntiiig mechanism 

PPA required a c hange in the development and manufac- 
turing paradigm at the HP Vancouver Division 

The PPA architecture does not support printing in the tradi- 
tional Standalone DOS environment In the Windows envi- 
ronment, all information sent to the printer is bitmapped 
graphics. The data Ls prepared under the control of a single. 
HP-designed and optimized printer driver. In the older DOS 
environment, application vendors write their own printer 
drivers. Applications send ASCII codes to the printer and 
exped the printer to use its own internal fonts to generate 
the bitmapped characters, The applications have no knowl- 
edge of the printing mechanism and hence are unable to do 
any swath cutting or data formatting. 

The IIP DeskJet S20C does support printing from a I" IB 
application if the application is run under the Windows envi- 
ronment Windows allow s D< (S-only applications to be run 
in a DOS box. Printing in this environment uses the standard 
Windows printing mechanism and hence the IIP driver. 

PPA printers require a higher-powered host than non-PPA 
printers to achieve comparable levels of performance. 
Because I he job of swath cutting and data formatting is now 
done by the PC, more computing power is required. < >ii the 
IIP DeskJet K20C. acceptable levels of performance are 
achieved w ith a 66-MHz Intel IXIi based machine with 8M 

bytes of RAM. 

PPA required a shift in the HP Vancouver Division's develop- 
ment and manufacturing paradigm, Having designed anil luiili 
PCDbased printers lor over I". years, all of l lie division's lools 
and processes wen centered around this type of printer. For 
instance, over I he years I lie manufacturing and Customer 
assurance organizations had developed many lools based 
arotttld Pt I. printers for doing production tests and exercis- 
ing the printer in environmental lesis. None of these lools 
work with a PPA printer. Similarly, the Brotware test organi- 
zation had to revise its tests completely. Because the HP 
DeskJet 820G printer has only <i IK bytes of l!( )M. extensive 
demo pages and self-test pages could no longer be included 
in the printer. 

Because of the high level of integration and because the 
architecture follows the paradigm that "the processor 
doesn't touch the dots." it is difficult to observe the How of 
data through the machine This made debugging problems 
during development quite challenging. This problem was 
solved in several steps. First, the ASIC design team did 
extensive simulations. Second, the leant used a hardware 
emulator to emulate the digital ASIC. This emulator had a 
mechanism Ihal provided ports lo internal nodes so that 
the} could be observed with a logic analyzer. Finally, simple 
patterns were <lev iseil anil senl through the architecture thai 
simplified problems and made debugging possible. 



Finally, in the PPA environment, the driver must hav e know 1- 
edge of the printing hardware. This makes the driver less 
universal and the job of leveraging the driver to future prod- 
ucts more difficult The driver was carefully organized and 
modularized so the hardware dependent pieces can be 
changed while the underlying driver features can be lever- 
aged into future products 

Inside the Printer 

Inkjet printing is a complicated process that involves lying 
together several electromechanical sutisystems that work 
together to create the printed page. All Inkjet printers con- 
sist of these major subsystems regardless of the particular 
implementation used fpt each one. Fig- 4 shows the HP 
DeskJet S20(. printer with it- top cover removed and the 
major subsystems labeled. 

Paper Path. The papei path is responsible for moving paper 
through the printer. The user inserts paper or envelopes into 
the input tray. At the appropriate time, a single piece of 
paper is picked from the stack and begins moving through 
the primer. Each lime the carriage finishes a pass over the 
paper, the paper Ls advanced an appropriate amount to pre- 
pare for the next pass of the carriage. At the end of a page, 
the paper is "kicked." or deposited in the output I ray. where 
the user can remove it. In the HP DeskJet S20C, 8 single 
electric motor is used to mov e the paper. Paper movement 
is open-lOop — there is no feedback about the actual paper 




Fifj. l. HPQcakJei 820C printer subsystems. 
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position. The paper path in the HI' DeskJet 820C gracefully 
handles a variety of paper sizes and thic knesses as well as 
envelopes. 

Carriage. The carriage holds lite pens used in the printer. To 
print a swath of data, the printer moves I he carriage across 
the page at a constant speed, firing the pens at appropriate 
limes. A single motor is used to move the carriage. Carriage 
movement is a Closed-loop process. The carriage's position 
is Iracked using an LED, which shines on a photoreceptor 
and a strip of plastic made up of alternating light and dark 
regions placed between the LEI) and the photoreceptor. As 
I he carriage moves across the page, logic recognizes when 
the LED is in front of a dark region and when it is in front 
of a transparent region. L'sing this information, il tracks the 
carriage's position on the page. In addition to holding the 
pens, the carriage holds a printed circuit hoard. On the 
hoard are pails thai connect electrically to the pens and a 
poll ion of the electronics needed to drive the pens. In the 
HP DeskJet 820C, all electronics directly used to fire the 
pens are located on the carriage hoard (see article, page 38). 

Print Cartridges. The print cartridges hi the HI' DeskJet 820C 
are user-replaceable cartridges that contain both the ink and 
the mechanism for placing the ink on the paper (thermal 
Inkjet). They are often referred to simply as the "pens." The 
pens are the same as those used in the HP DeskJet 850 and 
870 printers. There are two pens; a black pen and a color 
pen. The black pen has 300 nozzles spaced at 1/000 inch. 
The swath height for black is therefore 1/2 inch. The color- 
pen holds three colors of ink: cyan, magenta, and yellow. 
Each color is printed with a series of 0-1 nozzles spaced at 
1/300 inch. The swath height is therefore approximately 
1/5 inch. ( 'olors other than cyan, magenla, and yellow are 
created by placing dots of these three colors in close prox- 
imity in appropriate ratios. Since at a distance of more than 
a few inches the resolution of the eye is not great enough to 
discern the individual dots, they blend together visually, 
forming the desired colors. 

Pen Service Station. To maximize the life of the pens and to 
maintain optimum print quality over that life, it is necessary 
to service the pens. Servicing includes but is not limited to 
such actions as capping the peris when not printing so that 
l hey do not dry out and wiping them on occasion to prevent 
ink buildup. The service station includes all the electrical 
and mechanical parts necessary lo perform the servicing 
actions, hi particular, it includes a motor that is used to 
actuate actions such as wiping and capping. The motor is 
controlled by an open-loop process. 

Power Supply. A power supply is needed to provide energy 
to the printer. The power supply accepts an ac signal from 
a standard outlet and converts il to the dc voltages and cur- 
rents used to power the printer. Because the IIP DeskJet 
820C will be sold worldwide, it is capable of running on all 
permutations of 50/OOHz and 110/220Y inputs found around 
the world. 

Digital Electronics. The digital electronics are responsible 
for controlling all of the other electromechanical parts. The 
digital electronics generally include at least one of each of 
the following: a microprocessor, a ROM. a DRAM or SRAM 
Or both, a block of custom logic, and an EEPROM. The 
microprocessor controls all mechanism movements, I/O, 



the user interlace, and print data manipulation if necessary. 
The |{( )M holds firmware, and in prev ious products but not 
in the IIP DeskJet 820C, fonts. The volatile memory is used 
to hold firmware variables and print data and commands 
that arrive over the I/O port. The custom logic implements 
priiiler-spccific functions I hat require hardware support. 
The EEPRt >M holds information thai must be retained 
through a power cycle. In the HP DeskJet 820C, the micro- 
processor, the ROM, (he custom logic, and an SRAM are all 
integrated into a single ASIC (see articles, pages 22 and Ml ). 

Case. The case is the pan of the printer that the customer 
sees, so every effort is made to make it attractive. The case 
includes a small panel of LEDs and buttons by means of 
which the user interacts with the printer. The front panel of 
the HP DeskJet K20C is very simple, consisting of just two 
bullous and three LEDs. The case also has a door that can 
be lihed lo gain access to the pens. 

Driver. In addition to the physical pari of the printer, all 
printer products require a software driver, which resides 
on the host computer. The driver allows applications soft- 
ware running on the PC lo interact wilh the printer. In most 
modern operating systems, an application that wishes lo 
print calls the printer driver through the operating system. 
This model allows the printer manufacturer to supply the 
driver, so application suppliers don't have lo. The exception 
to litis model is DOS. which requires the driver be integrated 
into the application. Because of the simplifications that can 
be made to the printer, the IIP DeskJet 820C only works wilh 
Windows applications, or I)( >S applications running in a 
DOS box (see above and the article on page 12). 

HP DeskJet 820C Printing Sequence 

To begin the printing sequence, the user chooses Print from 
the appropriate menu in the application. The application 
formats the page into the standard description formal used 
by the Windows operating system, l'sing this formal, (he 
application passes a description of the page to the printer 
driver. The driver reformats the page into a form appropriate 
for sending to the printer. In the process of reformatting the 
image, the driver performs various transformations to map 
the image to the inkjet printing technology, In previous IIP 
inkjel printers, the formal used to send data lo the printer 
was PCL, a page description language. In the HP DeskJet 
820C, the format is a bitmapped image that can be used to 
fire the printheads wilh minimal further transformations. 

Once the image is in the right format, data is sent to the 
printer over the l/( ) cable. Before ihe data can lie printed, the 
driver must send commands to the printer that tell il to pre- 
pare to print a page. When the driver sends these com- 
mands, the printer first uncaps ihe pens and services them 
to prepare them for printing. Then it picks a piece of paper 
and advances il lo ihe first spot where priming will occur. 

After the printer is prepared and has enough data in Its local 
memory to print an entire swath, il performs a print sweep by 
moving the carriage across the page. As it moves ihe carriage, 
il pulls data out of its local memory, performs some final 
formatting, and uses the data to fire the printheads at appro- 
priate limes. After Ihe sweep has been completed, die printer 
advances the paper, wails for enough data to print the next 
swath to arrive over the I/O, and then, upon command from 
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the driver, prints the (lata. The process repeats for the rest 
Of tile page. At end of the page, again upon command from 
the driver, the printer kicks the paper, depositing it in the 
output tray. Assuming that then- are no further pages to be 
printed, the printer then parks the carriage over the service 
station, caps the pens, and performs other cleanup pen i 
vicing. The printer then waits patiently until the next time it 
is called upon to print. 



Summary 

The advance of personal computer horsepower and the uni- 
formity of the Windows printing environment in which HP 
has control of the printer driver have made it possible to 
change from the PCL printer model to a PPA printer model. 
The customer benefit is that PPA printers can provide equiv- 
alent levels of performance at a much lower cost. 

Mioassh. Mfete US-COS are If S register! MdemMks of Miciosott Cmpooiuo 
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PPA Printer Software Driver Design 



The software driver for the HP DeskJet 820C printer performs many 
functions that were formerly performed in the printer, including swath 
cutting, data formatting, and communications. The driver also includes 
a PCL emulation module for DOS application support. 

by David 1YI. Hall, Lee W. Jackson, Katrina Heiles, Karen E. Van der Veer, 
and Thomas J. I la I penny 



The software driver lor I he new IIP DeskJet 820C printer 
includes many new functions that need to be performed on 
the host computer because of the printer's Printing Perfor- 
mance Architecture (PPA). In older PCL (Printer Control 
Language) printers, these functions were performed in the 
printer. Kig. 1 shows the differences. These functions 
include: 

• Swath culling 

• Data formatting 

• PPA communications 

• PCL emulation for DOS application support. 

This article provides an overview of the changes necessary 
for supporting PPA and then discusses each of the functions 
listed above in more detail. 

Driver Overview 

Under the Windows' operating system, printer drivers are 
responsible for supporting a specific API (application pro- 
gramming inierface) known as the DDI (Device Driver Inter- 
lace). This interface gives the driver fairly high-level drawing 
commands. It is up to the driver to take those commands 
and produce a bitmap that can be encapsulated in a language 
and sent lo the printer. 

Typically, within a Windows printer driver, a rendering engine 
takes the DDI commands and produces a rendered bitmap. 
A halftoning algorithm is performed on (he rendered bitmap 
and a halfloned bitmap is produced. This halftoned bitmap 
is typically in a format dial can be encapsulated in a language 
such as PCL and then giv en to the printer. 

For the HP DeskJet 820C. this halftoned bitmap has to be 
put through additional processing as shown in Fig. 1 to 
Create data that is ready to be printed by the printer's elec- 
tronics directly. This additional processing includes swalh 
cutting and sweep formal ling. 

Since the HP DeskJet 820C does not understand P( L I Printer 
Control Language), a PCL emulation module is necessary n> 
provide support for DOS applications. The DOS application 
data stream is captured by a DOS redirect or and passed to 
the PCL emulator, which produces a halftoned bitmap ready 
for swalh cutting. 

PCL versus PPA 

Fig. 2 shows the priming model for PCL printers. For PCL 
printers, the process of encapsulating the halftoned bitmap 



is fairly straightforward. Raster data from the halftoned bit- 
map is compressed. PCL Wrapped, and then senl lo the I/O 
module. The reason that this is a simple process is that PCL 
printers are designed to receive data in the same formal as 
the halftoned bitmap. PCL printers unwrap the data into an 
internal buffer and perform the necessary swalh cutting and 
data formatting internally. 

Fig. :3 shows the printing model for PPA printers. For the IIP 
DeskJet Slide, the PCL encapsulator is replaced wilh an SOP 
daia encapsulator. SCP (Sleek Command Protocol) is an 
HP-proprietary command language. This module contains 
swalh culling functionality, data fOnnatting, SCP language 
encapsulation, and printer status management. 

Raster data from (he halftoned bitmap comes into the SCP 
data encapsulator, goes through the SCP manager, and 
eventually arrives at a raster block within the swalh manager. 
The swath cutting slale machine examines the dala and de- 
termines (he appropriate sweep to generate. A sweep is a 
collection of rasters appropriate for the printer mechanism 
lo print while it sweeps the printhead over I he paper. 

I Mice the sweep is generated, il is giv en lo the sweep for- 
matter. The sweep formatter is responsible for taking ihe 
Sweep data and putting it into the appropriate formal for die 
IIP DeskJet 820C internal hardware. Then the data is com- 
pressed, wrapped in SCP. and handed off to the I/O layer. 

The I/O layer is responsible for communicating with Ihe 
printer by wrapping the data si ream in VLink and U'.l'.V. 128-1 
protocols. VLink is an HP-proprietary link-level protocol anil 
IKKK 1281 is an industry -standard physical-layer protocol 

Performing Swath Cutting on the Host 

Swath cutting is ihe process of taking a page of halftoned 
raster data and producing sweep data appropriate for the 
carriage electronics to print as the printhead is sweeping 
across the page. Swath culling has historically been pai l of 
printer firmware, but in the HP DeskJet S20C primer, it is 
[tai l of the software driver running on the host computer. 
'IVpically. a swath manager encapsulates a swalh culling 
engine and receives as input a bitmap representation of the 
page to be printed. The swath manager is responsible for 
determining how the pens and paper should be moved and 
when and how the pens should be fired lo produce the 
printed page. The swath manager must balance Ihe often 
conflicting goals of printing with the highest possible print 
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Fig. 1. frinler driver functional block diagram, showing differences 
between PCL and PPA data pains. 

qualify and printing as fast as possible. The swath manager 
nnisl be aware of certain printer-specific 1 attributes such as 
printhead alignment and strategies to minimize line feed 
error. In PPA, swalh management is performed on the host 
computer. 

The process of swath cutting can be readily modeled using 
a slate machine. Consider the example shown in Fig. 4. A 
stale machine capable of processing this page would need to 
contain five slates: Top of Page, Blank Skipping, Black Text Printing, 
Color Graphic Printing, and End of Page. Thus, we can create Ihe 
stale machine shown in Fig. ">. A particular inslance of a 
stale machine exisls for each print mode Ihe swalh manager 
supports. For example, there could be a print mode for 



pages that only have black text on them, another print mode 
for pages with black and color, and yet another print mode 
for pages with complex graphic images. 

As the state machine begins to examine the data on the 
page, it starts in the Top of Page state. The first data it comes 
to is a series of blanks. This would cause it to move to the 
Blank Skipping state. During this transition the swath manager 
would typically load the page. While in the Blank Skipping 
state, the swath manager would advance the paper. Next, it 
would encounter a black text region and move to the Black 
Text Printing state. Depending upon the type of printing being 
done at that time, this transition may produce a sweep. 

Assume that for this print mode, the data on the page is 
being printed by making two sweeps for each line. Thus, in 
making ihe transition from Blank Skipping to Black Text Printing 
the printer could print the first pass of the black text region 
with the bottom half of the printhead, advance the paper 
half a printhead height, and then enter the Black Text Printing 
state. During the next sweep generated, the Black Text Printing 
state would finish the lines that were printed during the 
transition and continue printing the black text region 
(see Fig. (5). The data on the page would continue to be 
consumed and transitions made between states until the 
End of Page stale is reached. 
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Fig. 3. PPA printing rnudel. 

Obviously, this example is a simple one. The number of 
states and the number of transitions to consume data for a 
real page can be quite large. Using PPA we have the oppor- 
tunity to perform the resource-intensive task of swath cut- 
ting on the host. This allows greater flexibility in developing 
machines with unique print modes, which provides the 
opportunity for higher print quality and throughput as well 
as reduced mechanism costs. 
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Fig. 4. Swath rutting state machine transitions tor a typical page* 
PPA Data Formatting 

The HP DeskJet 820's Printer Performance Architecture 
requires the host to perform the majority of the data manip- 
ulation. The data that is sent to the printer must be in a for- 
mat that is very close to the final form used to fire the print- 
heads. The main difficulty in formatting the data for the 
print head lies in the fact, that the data doesn't come out of 
one position on the carriage mechanism. Instead, there are 
two columns for each of the four pen colors. Each column is 
at a different vertical and horizontal offset from a relative 
zero carriage position. To minimize the cost and complexity 
of the electronics in the printer mechanism, the data sent 
from the host to the printer must be ordered so that it is 
ready to go directly into these offset printheads in the 
appropriate order so that it is fired at the correct locations 
on the page. This ordering is based on: 

• The starting page position of each color 

• The servant arcliitecture in the printer hardware (described 
later) 

• The printhead (see Fig. 7). 




Fig. 5. Swath cutting state macliine. 
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Fig. 6. (a) In making the Inuisition from Blank Skipping to Black Text 
Printing, the printer prints the first pass- of the black text region with 
the liotlom half of the printhead, advances the paper half a print- 
head height, and then enters the Black Text Printing state, (b) During 
the next sweep generated, the Black Text Printing state finishes the 
lines that were printed during the transition and continues printing 
the black text region. 

To print a page, it is necessary for the carriage mechanism 
to move hack and forth across the page, firing drops of ink 
as it moves. Each movement of the carriage across the page 
is called a print sweep. When the driver receives a page 
to print from some application, it renders the page into a 
halftoned bitmap. At this point, a PCL printer driver would 
send compressed and encapsulated PCL data directly to the 
printer. The PPA printer driver uses the swath cutting state 
machine to generale a swath of data that can be printed by 
a single pass of the pen carriage. The resulting swath of data 
is passed on to the sweep formatter, which manipulates the 
data into a buffer that can be copied directly to the print- 
heads. The print sweep formatter uses knowledge of I he pen 
carriage, hardware, and firmware architecture to prepare 
and reformat the data into a print sweep. 

The number of print sweeps required on a given page is 
depeiidenl upon: 

The amount of data on the page ( text or dense graphics) 
The prim mode selecled by the user (best, normal, or 
econofast ) 

The paper type (plain, glossy, transparency, or special). 

For each print sweep, the host sends two pieces of informa- 
tion to the printer. The first is the PRINTSWEEP data, a buffer 
of image data sent before the PRINT_SWEEP command, which 
contains an entire sweep of swing buffer data blocks in 
I he correct order. The second piece of information is the 
PRINT_SWEEP command, the mechanism by which the driver 
tells the printer where and how to place the print sweep 
data on the page. A PRINT_SWEEP command contains mini- 
mum and maximum positions for each pen column, the 

Paper in Printer 



print direction, print speeds, and NEXT_PRINT_SWEEP informa- 
tion. 

The PRINTSWEEP command information is calculated by the 
printer driver based upon: 

• Which pens are active (black, cyan, magenta, yellow) 

• The starting and ending locations on the page for each pen 
color 

• The direction of the print sweep 

• The servant architecture: 

O The distances between pens 
The distances between odd and even columns within a 
pen 

o The 0,0 position in relation to the pen columns. 
Servant Architecture 

The servant hardware (see article, page 81) is composed of 
a pair of buffers, called Swing buffers, for each column of 
the printhead (two columns per color). To build a print 
sweep, the driver must: 

• Separate the image into CMY planes, or primitive data 
blocks 

• Separate the primitive data blocks into swing buffer data 
blocks 

• Order the swing buffer data blocks into a servant image. 

A primitive data block (a bitmap image of each plane for 
each color) is created by the driver. Each primitive data 
block needs to be split into two separate swing buffer data 
blocks: an odd block and an even block. This is necessary 
because of the pen design, which consists of two offset 
columns, as pictured in Fig. 8. 

Each column on the color pen has 32 nozzles. The color pen 
has a height of 64/300 inch. For any given column of data, 

rows 1, 3, 5 63 will be part of the odd column and rows 2, 

4. 6 64 will be part of the even column. 

The even and odd swing buffer data blocks are each 8 bits 
wide, the width of servant RAM, and each is the height of a 
printhead nozzle column. Swing buffer dala blocks are cut 
for each primitive color and for either the even or odd 
nozzle column. Thus, each swing buffer data block contains 
every other row from the primitive data block. 
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Fig. 7. Ill' DeskJet HOC print cartridge layout. The lines correspond 
to no&Eje columns and their general configuration on the printer 
carriage. 
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Fig. 9. Primitive data block organization for a prinlhead that 
lias two columns of six nozzles per color. Byte n (n=0, 1. 2, 3, 
4, 5) is a buffer of data 8 pixels wide by li rows (nozzles) high. 
The HP DeskJet 8200 printheads have two 32-nozzle columns 
per color, as shown in Fig. 8. 

Fig. 9 shows a simplified example of a primitive data block. 
Bach byte is a buffer of data that is one byte (S pixels) wide 
by N rows high, where N is the number of nozzles in a print - 
head c olumn. For the example in Fig. 9, N is 6, while .\ is 32 
for the HP DeskJet 820C color printheads. 

Each column of the primitive data block in Fig. 9 is divided 
into four swing buffer data blocks with bytes relocated lo 
the positions shown in Fig. 10. Only the cyan pen is shown, 
and only two of the swing buffer data blocks for each col- 
umn of Fig. 9 are shown. The drawing would be similar for 
the magenta and yellow pens. 

Once the data is in the form of even and odd swing buffer 
data blocks, the blocks must be ordered and sent to the 
printer. This ordering is done with knowledge of the column 
spacing on the printhead and knowledge of the order in 
which the servant architecture will require the data. The 
printer driver controls the order in which the columns will 
trigger via fields in the PRINT_SWEEP command. The ordered 
swing buffer data blocks are then sent down as PRINT_SWEEP 
data ready lo be loaded into the primitive swing buffers in 
the printhead. 

Swing Buffer Data Swing Buffer Data 

Blocks. Byte 0 Blocks. Byte 1 
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Fig. 10. Swing buffer data blocks for ihe example primitive data 
block shown in Fig. 9. 



Each primitive swing buffer consists of two 8-bil columns, 
separated by a swing trigger point. While the servant print 
process is unloading one side of t he odd column swing buffer, 
the other side of the odd column swing buffer is being loaded 
by the servant load process. Once the byte is loaded, the 
servant print process fires one bit by 32 rows al a time for 
each pen column in the color pen. When the servant print 
process has unloaded all eight bits, it crosses a swing trigger 
point, and the servant print process switches to the other 
swing buffer and triggers the son ant load process 10 load the 
empty swing buffer. The pen lues one bit by 32 rows al a time 
for each pen column. The servant (printer) is responsible for 
any complexity involved below r the byte level. 

When all of the swing buffer data blocks have been con- 
sumed by the printhead, the carriage mechanism uses the 
NEXT_PRINT_SWEEP information to position itself for the start 
of the next print sweep. 

Because the PPA printer relies upon the driver to format the 
data appropriately, the architecture does not require the 
printer firmware to have any knowledge of the operations 
just described. Thus, the cost and complexity of the elec- 
tronics in Ihe printer mechanism are significantly reduced. 

PPA Communication 

One of the goals of the HP DeskJet 82QC printer is to pro- 
vide continuous feedback to the user during any printing 
operation, and lo guide the user during problem solving. To 
accomplish this, the driver requires a mechanism to ask the 
printer for information and to allow the printer to notify the 
driver whenever something happens (the printer is out of 
paper, the user opened the cover, etc. ). The mechanism used 
by the PPA driver to communicate with the printer is called 
status messaging. 

To notify the user to align the print cartridges when a print 
cartridge has been changed, that the top cover is open, or 
that something else needs attention, a bidirectional link with 
the printer is required. Two new HP-proprietary protocols 
allow the driver to communicate bidirectionally with the HP 
DeskJet 820C: V'Link packet protocol and Sleek Command 
Protocol (SCP). Previous HP DeskJet printers used an I/O 
packetizing protocol called MLC (Multiple Logical Channel) 
and a proprietary HP printer command protocol. For PPA, 
V'Link replaces MLC, and SCP replaces both PCL and the old 
printer command protocol. 

While giving users error messages might seem to be a luxury 
they could do without, the real reason to have a protocol 
like V'Link is that it is useful lo figure out what is wrong 
when, for example, the printer's input buffer fills up. the 
printer stops accepting data, and the host is unable to send 
even one more byte. This often happens and is temporary, 
but in the days before bidirectional protocols, the driver 
would sometimes wait and wait to be allowed to send again, 
and it didn't know whether the delay was because the top 
cover had been opened, a print cartridge had failed, or a 
fatal error had occurred. It is helpful to know whether to 
abort the job or ask the user to insert a print cartridge or 
close the door. With a bidirectional protocol, the printer tells 
the driver exactly what the problem is. and the driver can 
decide what action to take next. 
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Fig. 11. 1'I'A status messaging architecture. 



Data that is sent by the printer, such as notifications that 
something is wrong, are put in the printer's output buffer. 
The driver spawns a hidden executable at the beginning of 
each print job called the port sn iffer, which checks the port 
every half second to determine if the printer has sent any 
data If so. the data is muted through the IEEE 1284 layer 
to the VLink layer, which then posts a message to the I/O 
manager's hidden stanis window. 

The status window uses a callback to call into the SCP man- 
ager, which translates the status information, and if the mes- 
sage is something that should be displayed to the user, puts 
it on the event list. The event list prioritizes the messages on 
it so thai the ntost important message gets sent to the HP 
Toolbox, which displays the dialog box to the user. If the 
message is an error, it may get resolved ( for example, the 
user puts paper in the printer and presses the Resume button i. 
The message is then routed up through the same path and 
deleted from the event list. The Toolbox takes the dialog box 
down and displays the next most important message, if 
there is one. 

Internal Objects in PPA Status Messaging 

PPA status messaging involves several high-level modules 
and objects: the SCP (Sleek Command Protocol) manager, 
the I/O manager, the VLink module, and the event list (see 
Fig. 12). 

SCP Translator. The function of the SCP translator object in 
the SCP manager is to encode data into the SCP format and 
decode messages received in the SCP format from the printer 
into query replies and event information. The SCP translator 



A bidirectional link is not required for printing or to have 
limited status feedback from the printer. However, unlike 
PCI, printers, which can accept either PCL data wrapped in 
MI.< ' or raw PCL data. PPA printers can only i nte r pre t data 
wrapped in VLink and SCP. Titus, while MLC is an option 
that can be added when a bidirectional link exists, VLink 
must handle printing with and without a bidirectional link as 
well as printing to a file. 

Based on VLink's channelization features, there are t wo 
paths the data can lake to the printer. One is for image data 
(the dots thai will go on the page), and the other is for coin 
tnand data. Command data includes commands sent to the 
primer, such as "Prinl this sweep," requests for information, 
or queries, such as "What print cartridges are installed?", 
and status information, termed mttustatvs, such as "The top 
cover is open." Sending image data is easy from an IA ) 
standpoint — if the printer has room in its buffer, the driver 
will send the data. Since command data must be sent and 
also received (auiostatus may come in al any lime), il is by 
nature a more complex affair. 

As shown in Fig. 11, data that comes in from the front end 
of the driver goes through the data encapsulate n°. like PCL 
printer drivers, but from there it goes through several new 
objects. The SCP manager wraps the data in SCP and sends 
it to the I/O manager, w hich provides an interface to the 
dataconWI objects. The VLink layer wraps the data in the 
VLink protocol and sends it to the IEEE 12S-1 layer ami out 
lo the printer. 
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Fig. 13. SGPcomraafld format; 

does not send sci' (lain directly to [lie I/O manager, since 
memory management lor the data buffers is done in the Si -P 

translator's Clients, Which are Hie swath manager anil the 
status manager. The client of the SCP translator passes in a 
pointer to the data, an empty buffer, and the maximum data 
lengfit Once the data has been packaged, if the SCI' transla- 
tor finds thai the data is larger than the buffer, ii will return 
an error. < Hhcrwisc, ii will pass hack the actual SCP dala 
length. The goal in designing the St 'P translator was to en- 
capsulate the Sleek Command Protocol so thai changes in 
SCP in the firmware affect clients of this module as litlle as 
possible. 

Commands in SCP use the format shown in Fig. 13. The 
command specifier Held identifies die SCP command. The 
length Held indicates the number of bytes in the data field. 
The dala field dues not exist for every command. 

Priorities. Priorities allow Ihe primer to execute commands 
in a different order than receiv ed. This may be necessary 
when a command cannot complete execution and it is desir- 
able for Ihe printer to process queries so the driver can find 
out what the problem is. Priority levels are defined in the 
SCP translator and the clients can set whatev er priorities 
they like. Standard priority levels are defined as show n in 
Tablet 

Table I 
Command Priorities 

Command Priority 

Printing Commands Low 

Queries Medium 

mitJalizingand Hi-initializing High 
the I/O Link 

Recovering from Errors Recover 

Canceling Cancel 

Restarting the Printer Restart 

It is assumed that the swath manager will send all of its 
printing commands (L0AD_ME0IA. PRINT_SWEEP. EJECT. MEDIA | 
at the Iowesl priority. Any queries it needs lo make will call 
into the stains manager. All queries should he at Ihe same 
priority and higher than printing commands. It is up lo the 
clients Co set priorities. 

Status Manager. The Status manager manages messages to 
and from the printer. These messages can be broken into 
tWO categories: events and queries. Events are unsolicited 
notifications by the printer (i.e.. autostatus) that something 
has occurred to change the state of the printer, such as "the 
door is open." Queries are requests for information made by 
ihe driver to Ihe printer, such as the pen IDs of Ihe installed 
pens. The status manager trac ks the state of the printer and 
creates events when state changes occur. For example, when 



Ihe Resume billion is pressed, an internal slale change occurs. 
This slale change is recognized by the status manager and 
reported as an eveni i>> the event translator. 

When the status manager receives notification of an event, it 
determines what has changed and whether Ihe event is some- 
thing the event translator has requested to know about If ii 
is, a callback in the evenl translator is called. 

Upon starting a print job, the status manager queries the 
primer lo gel the current state of events. No event notifica- 
tion will be receiv ed until an evenl occurs in the printer. 

Event Translator. This module exists between the event list, 
which is Windows-specific, and ihe status manager. The 
event translator translates the bit-field dala. which is re- 
lumed to the status manager by the printer in autostatus. 
into events. New events are added lo the evenl list by the 
status manager, and events that sure no longer valid (e.g., the 
door was open but the user shut it) are removed from the 
list. The ev enl list orders the events repotted to it according 
in (heir importance to the user, and I ells the stains monitor 
which dialog box to display. From most important ( 1 ) lo 
least important ( 10), the following event priorities are used: 
( 1) I/O errors. (2) paper jam. carriage stall, or maximum 
thermal limit. (:!) pen failure. (4 ) wrong pen. (5) low or out 
of ink. ((i) petl missing. (7) out of paper, (8) cover open, 
(!l) dry timer. ( HI) new pen. 

I/O Manager. This module is intended to glue the VLink mod- 
ule, which is Windows-specific, to the SCP manager, which 
is shared. Handling for events, queries, and buffer manage- 
ment must be performed by the I/O manager in addition to 
sending dala to the printer as quickly as possihle. 

Events. The l/< ) manager creates a hidden w indow so Ihal 
when the printer sends unsolicited event notification, Win- 
dows messages to that effect can be posted to this window 
by ihe VLink module. When the I/O manager processes Ibis 
window message, it will read the SCP dala huffered by 
VLink and call a callback in the slams manager, passing in 
the SCP dala. 

Queries. To get replies to queries, the inquiring module calls 
VLink. specifying a buffer in which to place Ihe reply. VLink 
checks Ibis query reply buffer to see if anything has been 
returned in response to the query. If so, ii immediately re- 
turns with the SCP data. If not il polls Ihe incoming channels 
for a specified timeout period to attempt CO retrieve the reply. 
If a reply is received before ihe timeout period expires, Ihe 
SCP data is passed through lo ihe status manager. 

Datacomm Paths. The image and command datacomm paths 
send data to Ihe printer as long as l here is space in the buffer. 
If spare ninsoul. the command datacomm path wails until 
more space becomes available. The image data is handled 
differently, If space runs out while sending image data, the 
image datacomm path returns to the caller, allow ing il to 
render more swaths until more space becomes free in the 
printer. 

VLink. The VLink module must package data in a protocol 
the primer recognizes, and send only as much dala as Ihe 
I irinter can take, as quickly as possible. \T,ink must also 
unwrap data from the printer and route the messages to the 
appropriate clients. 
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The VLink protocol replaces MLC (Multiple Logical Channels) 
for the HP DeskJet 820C. Like MLC, VLink s intent is to pro- 
vide a way for the host and the peripheral to exchange data 
Unlike MLC, VLink is not optional. All data going to the 
printer must be wrapped in its protocol- In addition. VLink is 
streamlined or "sleek." and doesn't have many of MLC's fea- 
tures. MLC supported multiple logical channels, while VLink 
supports two outgoing and three incoming channels. 

Outgoing Channels. The printer accepts data in either its input 
buffer or its command buffer. The VLink module specifies 
which type of data it is sending through a field in the VLink 
packet header. A template of a VLink packet is shown in 
Fig. 14. 

Image data is sent to the printer's input buffer on the image 
data output channel. Commands and queries are sent to the 
command buffer on the command data output channel. 

Incoming Channels. Since a bidirectional link cannot be guar- 
anteed, all incoming data is optional. Tliis is necessary for 
file dumps and bad cables, and miscellaneous communica- 
tion problems. 

The primer periodically notifies the host how much buffer 
space is left in the printer. This is known as credit, and the 
printer sends notification for both the command and input 
buffers on the credit input channel. The VLink module will 
not send more data than the available credit. 

VLink ac cepts two types of data packets from the printer in 
addition to credit packets: query replies, which are expected 
on the status input channel, and a collection of bundled 
ileitis regarding printer status (such as out of paper), called 
autostatus messages. Autostatus messages ultimately map 
to events. 

An autostatus message from I he printer consists of a bil 
collection of several long words representing the current 
slate of I he printer. For example, when the door is opened, 
the door open bit in the collection is set to true. A report is 
generated on the autostatus input channel when any of 
these bits are toggled. 

When the VLink layer receives some data the data is identi- 
fied as either credit, a query reply, or an autostatus message. 
Credit is interpreted and handled within the VLink module. 
A query reply or an autostatus message is buffered internally 
so thai the clients can read it later. 

If a received message' is an autostatus message, the VLink 
layer posts a Windows message to the I/O manager indicating 
that an autostatus message is waiting to be read. When the 
I/O manager processes the Windows message, it reads the 
buffered autostatus message. Posting a message is necessary 
so that VLink can be free to poll the data lines for more 
incoming data from the printer. 

Once the buffered message has been read, it is deleted. Only 
one query reply and one autostatus message can be buffered 
at a time. If a new message comes in before the original 
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Fig. 14. VI. ink packet formal. 



message can be read, the new message replaces the old one. 
It is for this reason that no additional printer queries should 
be made while waiting for a reply. No harm is done if a new 
autostatus message overwrites the old message because 
the same information is contained in each message and the 
newest message is the most relevant. 

PCL Emulation for DOS Application Support 

The development period of the IIP DeskJet 820C coincided 
with most users rapidly transitioning away from DOS appli- 
cations towards Windows applications. While we expected 
that most users would use the printer in its optimized design 
center, we recognized that we needed an adequate bridge to 
the few DOS applications that would continue to be used. 

The IIP DeskJet 550C printer was the final printer to be sup- 
ported by most DOS applications, so the solution had to be 
functionally compatible with this printer and provide equally 
good print quality. We chose to provide compatibility with 
the IIP DeskJet 660C printer, which was a contemporary 
printer that satisfied these requirements and provided an 
internal interface that enabled us to separate the PCL per- 
sonality from the printer engine firmware. We planned to 
port the PCL personality functions to the IIP DeskJet 820C 
printer driver, encapsulating them in a PCL emulator module. 
The required printer-engine functions would then be supplied 
by the rest of the HP DeskJet 820C driver. In this way. we 
could minimize design changes and maximize the chances of 
identical compatibility. If a DOS application is ran from an 
MS-DOS prompt window, also referred to as a DOS box, the 
printer driver can intercept the PCL data stream that the 
DOS application sends to the PC's parallel port and redirect 
the data stream to the PCL emulator. 

The HP DeskJet 820C PCL emulator encapsulates the HP 
DeskJet 6fiOC formatter and text engine code. The design of 
the IIP DeskJet 660C firmware was such that all interfacing 
lo the external mechanism was done through a well-defined 
API internally known as the Ed Interface (see Fig. 15). 

The Ed Interface resides between the formatter and font 
manager and the rest of the firmware. It is a collection of 
function calls to the support code in the firmware. Since we 
reused the formatter and foul manager code, we provided 
the equivalent firmware functionality by mapping the Ed 
Interface calls into HP DeskJet 820C support objects. 

The functions or the formal ler and text engine firmware 
code were written in C, and as such are functions in the PCL 
emulator application (Fig. 16). The PCL emulator applica- 
tion provides C++ objects that encapsulate the functionality 
expected by the Ed Interface. 

The PCL emulator application is designed to receive a file 
name that contains the PCL data to operate on. Interfacing 
between the internal PCL emulator object and the external 
driver is provided through a PCL personality object. 

The PCL emulator is implemented as an executable applica- 
tion because the original firmware code expects to be a sep- 
arate task, and this implementation allows almost direct 
porting of the IIP DeskJet 660C firmware code. The PCL 
personality provides the handler functions and the external 
interface for receiving the PCL file name 
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Fig. 15. PCL emulation is provided in Ihe HP DeskJet 82UC printer 
by mapping the exist itLg Ed Interface calls to DeskJet 820C support 
objects, 



To allow DOS applications to prinl to the HP DeskJet 8201', 
it is necessary to capture Ihe data generated by the DOS 
applications. This process is referred to as DOS bn.r ivilhvr- 
tion. Essentially, it is necessary to capture the bytes intended 
for the parallel port and put them into a file so that the PCL 
emulator can properly interpret the data. 

Under Windows 3.1, DOS box redirection is not part of the 
operating system, so it was necessary Tor us to provide a 
redirection solution. This functionality is provided by a 
redireclor VxD (virtual device driver), a redirector DLL 
(dynamic link library), and a redirector EXE (executable), 
as shown in Fig. 17. These three pieces capture the data 
stream and put it into a temporary' file. This file is then hand- 
ed to the driver. ;utd the driver hands it to the P< L emulator. 

Under Windows 95 (Fig. 18), DOS box redirection is provided 
by the Windows printing system, so our redirector solution 
is not necessary for spooling to work under Windows 95. 
PCL printers essentially get DOS box redirection free. PPA 
printers need to intercept and perform PCL emulation on 
the DOS data stream. Microsoft provides a replaceable mod- 
ule called a language monitor where the data stream can be 
intercepted. The language monitor is a 32-bit DLL called 
directly by Ihe spooling subsystem. The language monitor 
takes the incoming buffers, wriles litem to a temporary file, 
and passes the file name to the driver. 

Porting the Firmware 

The process of porting the ('-language code from Ihe IIP 
DeskJet 66UC presented several challenges. The original 
firmware was developed for a Motorola (1800(1 processor, 
while the printer driver runs on the Intel 80x80 processor 
in Windows Hi-bit mode. 

TheSe two hardware platforms have conflicting ways of 
addressing memory for data types larger than a byte — the 
former is big endian (the most significant byte comes first ) 
and the latter is little endian. As long as a data element is 
consistently accessed with the same data type, there is no 
problem. However, there are places in which a data type is 
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written as several single bytes, (hen read as 2-byte or 4-byte 
quantities. We needed lo identify and c hange the code in 
these places. 

The original font data that describee! the glyph (shape) infor- 
mation for the text engine was a single block of 2oOK bytes 
of read-only data. This block was mapped to five bloc ks of 
resource data, since each block had to be less than 64K 
bytes for Windows 16-bit mode. These blocks are discard- 
able, meaning that the operating system can load them when 
it needs to read some data, but to load other code or re- 
source blocks when Windows has run out of memory, they 
can be replaced by other blocks. 

The original firmware's text engine depended on a special 
hardware component thai rotated font glyph data from hori- 
zontal tO vertic al orientation, could double the size of the 
data, and smoothed the edges of a glyph using several rules 
for HI' Resolution Enhancement technology ( RED. Since 
I his hard ware was not available to the printer driver, we 
were able lo simulate the first and second of these functions 
in software. We determined that the print <|ualiiy would still 
be belter than the HP DeskJet 550C even if we did not simu- 
late the REt rules. The resulting software simulation executes 
more slowly, hul the orginal firmware design included a font 
cache, which minimizes the the number of limes dial we need 
to execute this function. 

Some furl her syntax modifications were necessary. The 
primer driver is capable of supporting more than one of (he 
same printer, for example, a printer on the LPT1 port and 
another on the LPT2 port, and these printers can lie printing 
ai Ihe same time. For Windows to be able to execute multiple 
instances of the PCI. emulalor, Ihe code must be compiled in 

the Windows mediuyn-viemoiy model. This requited thai 

many ('-language pointer variables be- designated, /he pointers 
rather than the more efScienl morpoinfers. Also, some 
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subtle syntax correction was necessary because an integer 
data type is 32 bits for the 138000, bul 16 bits for Ihe 80x86. 

The PCL emulation implementation was accomplished in a 
staged development process. Two months before the Erst 
printer driver components to support the HP DeskJet 820C 
became available, we were able to build a IX IS application 
thai was totally decoupled from a prinler driver. It would 
accept a test input stream of PCL data and map the input to 
an output file of raster data, which could be printed on the 
IIP Desk-let 850C, which was mechanically identical to Ihe 
targe! IIP DeskJet 820C. Using our test center's extensive 
suite of input lest files, we were able lo stabilize Ihe polling 
implementation, within the limits of Ihe IX IS application. 
For example, we noticed that the IX IS memory allocation 
algorithm would fragment memory that was being coniinually 

allocated and freed, so thai eventually a memory aSocation 

rci|ucsl would fail. However, when we moved on to a subse- 
quent slage in which we depended on the Windows memory 
manager, we found thai this memory fragmentation no longer 
occurred. Once the DOS port was stabilized, we integrated 
the PCL personality into Ihe prinler driver, using the HP 
DeskJet 850C output target path, while slill providing an 
input file of PCL. Next we Introduced and stabilized the IX IS 
redirector input path. When the HP DeskJet 820C output 
target path finally became available, we were able to switch 
to it cleanly, and the PCt emulator became an effective tool 
to help stabilize (he new output target path. Finally, we com- 
pleted Ihe target functionality, always building upon a Stable 
base. 

To summarize, by reusing original firmware code we were 
able to provide identic al PCL functionality for PPA printers 
Providing support I'm Hie Ed Interlace API allowed Ihe firm- 
ware code to be reused with lit lie design modification. 



I'ig. 17. 1 » is box redirection for Windows 3.1. 



Window* is a U S legisleieil trademark nl Microsoft Corpnialinn 
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PPA Printer Firmware Design 



Hewlett-Packard's new Printing Performance Architecture (PPA) includes a 
significantly reduced set of printer firmware. "Don't touch the dots" was 
the firmware designer's golden rule. This means that the firmware and 
processor do only mechanism control, I/O, command parsing, status 
reporting, user interface, and general housekeeping functions. 

by Erik Kilk 



A significant factor in Hewlett-Packard's new Printing Per- 
formance Architecture (see article, page <j) is the reduction 
of the processing power embedded in the printer, ('sing the 
host PC for all image formatting leaves only motor, print 
cartridge. I/O. user interface, command, and status functions 
to he controlled by the firmware. This results in significant 
cost sav ings by reducing processor needs and by reducing 
R( >M and RAM requirements. The goal, which was achieved, 
was to reduce the ROM requirements to G4K bytes. 

Fig. 1 shows the traditional firmware architecture used in 
HP DeskJet printers. The firmware receives from the host 
PC a combination of text, text formatting commands, and 
raster graphics data. This is formatted according to the 
Hewlett-Packard PCL printer language specification. The 
informal ion to print arrives at a page description level, 
which requires firmware to rasterize a bit image, generate 
and place fonts, and format and cut the image into swaths 
according to the requirements and formal of the print 
can ridge. 

At the I/O layer, previous IIP DeskJet printers make use of 
the Multiple Logical Channel packetizing layer (MLC. being 
proposed as IEEE standard 1284.4 1 to offer multiple connec- 
tions between a host and a printer. PCI. and an IIP propri- 
etary peripheral status language share the bidirectional 
parallel port . 

The rasteri/.ing step involves converting text and text for- 
matting commands into a graphical bit image to be printed. 



Separate bit-image planes are created for each of the four 
ink colore; black, cyan, magenta, and yellow. 

The swath cutting step involves cutting the bit image into 
print-cart ridge-high swaths, performing image enhance- 
ments such as overlapping print sweeps, and adjusting the 
bit-image planes to the particular formal required by the 
print can ridges used in the printers. 

In general, not only does the traditional IIP DeskJet firm- 
ware consist of more modules but the modules themselves 
are considerably more complex than with the new Printing 
Performance Architecture. 

PPA Firmware Architecture Overview 

The primary goal of the Printing Performance Architecture, 
or PPA. is to reduce the price of an HP DeskJet printer while 
maintaining or increasing print performance. The digital 
electronics portion' of this savings is accomplished by reduc- 
ing POM. RAM. and microprocessor costs. ROM is reduced 
by moving the rasterization, font, and swalh module func- 
tions onto the hosts printer driver, and by using streamlined 
I/O and command language protocols. RAM is reduced by 
requiring only enough RAM for a worse-case print sweep 
plus spare RAM for firmware overhead. Microprocessor 
costs are held down by reducing the processing, Ln psulicular 
the data processing, required of the microprocessor. The 
'"don't touch the dots" concept enabled the use of a low-cost 
list Hill processor. 
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The PPA firmware design is rather liberal with the use of 
processes to both modularize and parallelize functionality. 
Table 1 shows the eighteen processes used in the HP Desk- 
Jet 820C printer. 



Fig. 2. H I ' I Jesfcli-t 820C printer (immarr anrliitoct ure ovc.rw M 

Fig. 2 shows the fi nil ware architecture of the MP DeskJet 
82IIC. It consists of a small set of communicating modules 
Each module is implemented with a few communicating 
processes and interrupt service routines, The processes 
communicate through the use of messages (see below). 

The P( t module receives data and commands from the host 
PC, passing lliem on to the command module, and transmits 
responses and status information back to the PC. The com- 
mand module parses and prioritizes the incoming commands 
and passes them on to the other modules, most often the 
mechanism module, for execution. The mechanism module 
receives paper load and eject, print sweep, and print car- 
tridge servicing commands, performs the requested actions 
by controlling the motors and print cartridges, and passes 
die results back to the command module. The 171 I user 
interface) module handles the lionl-panel state machine, 
sending commands lo the command module as necessary. 
The status module monitors the printer's stains, coimnuni- 
cales Ibis status back lo the PC via the l/< ) module, and 
keeps Ihe resl of the modules informed of system status. 

Processes. Messages, and Operating System 

A small and efficient custom operating system manages Ihe 
execution of multiple processes and ihe delivery of mes- 
sages from one process u> another. The operating system 

also provides support for interrupt service routines, delayed 
procedure calls, and binary semaphores, 

Processes. Multiple, cooperating independent threads of 
execution called iimrcsscs are used to provide priority, mod 
ularity, and parallelism within ihe PPA firmware architec- 
ture. Indiv idual processes are instantiated With a function 
slack, a fixed priority of execution, and a specific set of 
broadcast classes. The highest priority ready process exe- 
cutes unlil eithera higher-priority process becomes ready lo 
execute. Ihe current process is blocked wailing for a new 
message, or the process is blocked wailing for a semaphore 
lo be unlocked. The process's broadcast classes indicate 
which sel of broadcast messages Ihe process wauls lo re- 
ceive. Processes are sialic and never terminate, 

A fundamental architectural concepl is that lliere is a one- 
to-one correspondence between a process and a message 
queue. In oilier words, each and every process has ils own 
queue for messages and no oilier queue This concepl is 
hardwired inlo Ihe system. There are no facilities for Ihe 
Creation Or use of any other message queues. When a pro- 
cess requests a message, ils Context defines which queue is 
select ed. 



Table I 

Firmware Processes in the HP DeskJet 820C Printer 

Major Firmware Module 
I/O Command Mechanism Status U/l Other 

10 Parser 



Mechanism Autostatus Ul PState 
State Machine 



IEEE 1284 Pacer 

VLink Executer 
Pacing 



Walker/ Status 
Dispatcher Request 



Config- 
uration 

NV 
RAM 

Execute 
Data 

Test 
Print 

Simple 



Messages. Messages form Ihe fundamental communication 
method between processes. Physically, messages are fixed- 
size, small blocks of memory. They contain both required 
and optional fields. 

The typical life of a message is as follows. A process ac- 
quires an uninitialized message from the operating system. 
The process fills the necessary message fields. The message 
is posted to another process with a specific- priority. The 
receiving process gels the message and performs ihe action 
implied by the message's identity. Depending on Hags sel 
within the message, a response message may be posteil 
back lo the Originator Or fliC message may be released back 
lo ihe Operating system for reuse. 

The reception of messages can be gated by a priority or lim- 
ited by a timeout or both. Messages can be posied lo an indi- 
vidual process or broadcast to many processes. The posting 
Of a message can be deferred for a specific lime to provide 
for periodic actions, interrupt sen ice routines can only posl 
messages, so arrangements must be made lo acquire Iheir 
messages outside of interrupt execution, 

Table II shows Ihe message slniclure. Messages include a 
token field, which gives the message an identity or specific 
meaning, For example, a command module process requests 
raw input 'lata by posting lo an I/< ) module process Ihe 
RECV REQUEST message (a message with ils token set to 
RECV.REQUEST). A response field judical es which process is 
lo he posied Ihe result of Ihe message. For example, when 
processing the RECV_REQUEST message, Ihe 1/t > module pro- 
cess will posl a response back lo Ihe process mentioned in 
the response field. A dala pointer field, a size field, and a 
recover Held associate a block of memory with a message. 
The recover field indicates which process is lo be notified lo 
recover Ihe memory block when il is no longer needed. The 

use of associated data in this manner allows ihe firmware to 

pass dala blocks from process lo process and lei Ihe final 
process recover Ihe dala properly. 
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Table II 
Message Structure 

Message 

Field Size Description 

Token Hi hits Message identity. For example, 

recv_request indicates this message 
is a request In rereive data. 

Sender 32 bits Sending processes identity. 

Response -12 hits Iclentitj of the process to receive the 
response to the message. 

Data hits Pointer to an associated data block. 

Pointer For example, this could point to a 

block of input data for a RECV 

message. 

Data Size :i2 hits Numher or hytes of data assoc iated 
with the message. 

Recover .'(2 hits Identity of the process to recover the 
associated data. 

Flag 8 bits Indicates whether the message must 

be responded to or data must be 
recovered. If a response message, 
indicates if failure, <>K, or an 
unknown message type. 

Misc 1 :S2 bits Message-specific information. 

Misc 2 -i2 bits Message-specific information. 

Semaphores. Semaphores provide a mechanism to restrjci 
access to a shared resource (often global variables | to one 
process at a lime. They are analogous to a lock on a door. 
Semaphores can he instantiated, locked, and unlocked. 
There are only a few critical uses of semaphores in the 
system. One is for the exclusive use of global configuration 
data. Anot her is for the exclusive use of I he general-purpose 
memory pool. 

Delayed Procedure Calls. Individual functions can be executed 
at a later time \ia the operating system. The operating sys- 
tem maintains a list of functions to be executed and at the 
appropriate lime will execute the functions at a low inter- 
rupt level. Processes can lake advantage of this feature 
to execute Critical code at a higher-priority interrupt level. 
Interrupt service routines can take advantage of this feature 
to execute noncritical code at a lower interrupt level. Since 
a list of functions is maintained by the operating system, 
delayed procedure calls can be canceled. The user interface 
module uses deferred procedure calls to implement key 
debouncing. The deferred post feature of message posting 
is implemented by using deferred procedure calls. 

Interrupt Service Routines. Interrupt routines are statically 
installed. In practice, interrupt routines often just post a 
message to wake up a process. For more sophisticated needs, 
interrupt routines can logically suspend until a subsequent 
interrupt. This facilitates designing serial and sequential 
interrupt state machines. 

Memory Management. Memory management is strictly stain 
with few exceptions. The operating system does not provide 
any son of functionality to allocate or free memory'. The 



reliability of the system was greatly enhanced by designing 
il for sialic memory use. The I/O module does provide 
for I he use of its output ring buffer for general-purpose, 
restricted memory allocation with function calls such as 
Ring_Requesil) and Ring_Recover||. The restrictions were im- 
posed for simplicity and because of the ring nature of the 
buffer: memory must be allocated in multiples of 4 bytes, 
memory must be held for a very short lime or the efficiency 
of the output buffer will degrade, and although memory can 
he returned piecemeal, the pieces must start on -4 -byte 
boundaries and be multiples of 1 bytes. 

Firms (Soft Constants). A jinn \~- a concept added to the firm- 
Ware design lo facilitate adjusting constants postrelease. 
Constants that may need adjustment after the printer has 
been released for manufacture are grouped together in lists. 
Access to these constants is via the FirmO function call. Firm!) 
is called with a lisl and a constant identifier. FirmO looks up 
and returns the desired constant. FirmO also quickly scans a 
small constant replacement lisl. This replacement list in- 
cludes the original list and constant identifier along with a 
new value for the constant If a replacement exists, FirmO 
returns the replacement. The constant replacement list is 
stored in nonvolatile memory. Generally this would occur 
as a final step in the manufacturing process. 

I/O Module 

Fig. '3 shows how the I/O module is structured into physical, 
link, and application layers. The physical layer deals with 
the signaling on the parallel cable. The link layer deals with 
logically dividing a single cable inlo multiple logical channels. 
The application layer deals with I he various dala. command, 
Status, and pacing applications necessary to implement the 
printer features. 

Physical Layer — IEEE 1284 Parallel Port. The connector on the 
hack of I he IIP DeskJet 820C printer connec ts to the parallel 
printer port on I he PC. The IEEE 1281 bidirectional parallel 
port specification is supported by dedicated hardware and 
firmware. Hardw are performs the basic data transfer of bytes 
from the PC directly inlo RAM. Firmware supports the IEEE 
128-1 overhead required to put the port in the proper transfer 
modes and to transfer data back to the PC. 

IEEE 1284 redefines the traditional parallel port lines BUSY, 
NFAULT. PERR, and so on lo permit faster data transmission 
and to allow data to be sent back lo the PC from the printer. 
Faster data rates are achieved by having the host only 
pulsing the NSTROBE line until the printer raises its BUSY line. 
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Table III 
VLink Channel Uses 



S = Stan o) Packet 

Fig. 4. VUrtk parkef travi-lixif! mi the physH-al cable. 

Traditionally the N STROBE lino had to be held down for a set 
iiiiiiiuiurit lime period (which was a relatively long lime I. 

The IEEE 12!vl ov erhead for mode switching is implemented 
as a separate firmware process in the system. To adiievc the 
IEEE 12M-required -V>ms response time, the process runs at 
the highest priority in the system. The process monitors tin- 
parallel port lines and responds to changes by maneuvering 
through a constant state table. This state table includes In- 
formation on what to watch for on the parallel lines, how to 
respond on the parallel lines, how to gel and retrieve data 
at the appropriate limes, and which slates csui he expected 
next. 

Link Layer — VLink The link layer provides a simple logical 
channel protocol. To prevent the printers input buffer from 
completely filling up and preventing communication wilh 
the P< ', image data ami command data are separated into 
two logical channels. Each of these two logical channels is 
individually paced Co prevent one from blocking the other. 
To separate the data and commands into two logical chan- 
nels, the raw bytes are packet i/cd so dial a channel number 
can be assigned to each packet. 

A new HP-proprietary link-level protocol. VLink, replaces 
the more sophisticated MLC protocol used in the other 
DeskJet and LaserJet models. VLink requires considerably 
less code, can be substantially implemented in hardware, 
and doesn'l require a bidirectional link. 

Fig. 4 shows the VLink packets. To packet izc the data, VLink 
adds four additional header bytes to each block of data. 
First, a siart-of-paekel Character, S. is sent. Second, one byte 
specifying the channel number is sent. Third, a word is sent 
indicating the number of data bytes to follow. A packet can 
contain up lo (ilK bytes of dala. Custom I/O hardware snips 
off Ihe four header bytes, uses the channel number lo select 
a ring buffer in RAM in which lo slore the dala. and subse- 
queittty Haulers the data into the ring buffer by DMA. 

Table III shows how channels are allocated in Ihe Desk.lei 
K20C. Incoming packets arrive for either channel II or chan- 
nel I. Channel 0 is used for image dala. ( hannel 1 is used for 
commands < "ingoing packets are transmitted using channels 
I, 2, and 12K. Outgoing channel 1 is used for responses lo 
commands. ( (ingoing channel 2 is used for the periodic aitto- 
slalus information, t lutgoing channel 12S is used lo supply 
pacing information lo Ihe host PC. 

Ring Buffers. Two ring buffers store the two incoming dala 
si reams from Ihe host PC. One stores the image rlata arriving 
on VLink channel 0. The other stores commands arriv ing on 
VLink channel I. The ring buffers are implemented With a 
combination of custom hardware and firmware. 



Use 

Image Data 

Commands and Responses 
Periodic Atitostanis 
Periodic Ring Buffer Pac ing 



Input 
Channel 

0 
1 



Output 
Channel 



1 

2 

128 



Fig 5 shows a diagram of a single ring buffer. The custom 
ASK selects the ring buffer in which to deposit incoming 
data based upon the channel number in the VLink header. 
Incoming data bytes are placed into the byte pointed CD by 
the ring's fill register, and the fill register Ls incremented. If 
the fill register passes the high wrap register, the fill register 
is sci equal to the low wrap register. Once the fill register 
equals the recover register, no more input is permitted. Any 
further input for this ring buffer will cause the parallel pons 
BUSY line to be set high and remain high until the recover 
register is changed. 

For the command ring buffer, the grant register (a firmware- 
only register not in the custom ASIC) is used to mark the 
dala that has been grained to the parser. W hen the com- 
mand associated with Ihe data has completed executing, 
ils data is recovered by advancing the recover register. Tins 
permits further data input. 

For Ihe image ring buffer. Ihe ASIC advances the recover 

register its it pulls data out for the print cartridges. This 

occurs while the print sweep is taking place. This permits 
further input tO occur on the image channel in case Ihe buffer 
was previously full. The grant register does not exisi for the 
image buffer. 

A third ring buffer, one that is implemented entirely in firm- 
ware and has no custom ASK " registers, is used for an oul pul 
buffer and occasionally general-purpose memory allocation. 
The same ring buffer design is used, thereby reusing the ring 
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buffer utility (unctions. Ql this case, memory is granted to a 
process, advancing lite grant register. Eventually the memory 
will be recovered, advancing the recover pointer. For this 
output buffer, the input register has no use. 

An enhancement useful for both the command and general- 
purpose output ring buffers is the ability to recover blocks 
Of memory out of order. This is facilitated by managing sub- 
recovered blocks of memory between the grant pointer and 
the recover pointer. Willi the rule that all memory requests 
and recoveries must be restricted to multiples of 4 bytes, 
subrecovered blocks can be implemented using only the 
RAM contained within the recovered blocks themselves. 

Output. for output, the main control I/( ) process receives 
SEND messages from the other processes within the system. 
Like input, output is formatted as VLink packets. Three 
VLink channels are used: channel 1 to transmit command 
responses back to the host, channel 2 to transmit periodic 
autostatus back to the host, and channel 128 to transmit I/O 
buffer pacing information back to the host. 

The design handles cases when bidirectional I/O is not avail- 
able. This can happen when the printer driver is busy and 
not communicating with the parallel port, when the driver 
is not running at all, when mi external device using non- 
IEEE-1284-eoinplianl cables is between the printer and the 
host, when the PC does not support IEEE 1284. or when 
there exist miscellaneous hardware and software conflicts 
with the parallel port. 

In cases where bidirectional I/O is not available, output is 
prevented from accumulating inside the printer by buffering 
at most one packet per V'Link channel. Any prev ious packets 
are automatically recovered back into the system and never 
transmitted. This priority scheme ensures that the host PC 
always receives the latest status. The only repercussion for 
bidirectional systems is that the driver cannot send multiple 
queries to the printer without waiting for each individual 
response. 

Image Data. A key and early concept of PPA is that data arriv- 
ing at the printer will already be formatted for I he custom 
ASIC hardware controlling the print cartridges. In other 
words; the firmware and microprocessor in the printer do 
not process the data, nor do they move the data in RAM. The 
image data is transferred by DMA into the image ring buffer 
from the ASK ' I/O block and from the ring buffer to the print 
control ASIC blocks. 

Autostatus. To keep the host PC informed of the status of the 
printer, an autostatus process periodically formats a data 
block with the printer's current status. This data block is 
then given to the I/O module for transmission back to the 
host on VLink channel 2. 

I/O Pacing. When one of the input ring buffers fills up com- 
pletely and another byte arrives for this full ring buffer, the 
overflowing byte causes the parallel poll's BUSY line to raise 
and hold off the host PC from transmit ling any further data. 
Such a situation could prevent the host PC from querying 
the printer's status or canceling a print job. so the printer 
and host work together to prevent either of the input ring 
buffers from completely filling up, thus allowing Ihe other 
ring buffer lo continue to receive data. 



The printer transmits back It) the host periodic ring buffer 
status information on VLink channel 128. The data trans- 
mitted indicates both the instantaneous free space available 
in each buffer and Ihe amount of data recovered from Ihe 
ring buffers. The amount of data recovered from the ring 
buffers is cumulative. In other words, Ihe printer reports the 
total number of bytes it has recovered from all of the input 
buffers. 

This total number of recovered bytes permits the host PC to 
determine exactly how much space is available at any lime 
in the printer's input buffers, as long as it keeps Irack of how- 
many bytes it has itself transmitted. This mechanism is re- 
quired because Ihe printer's report of the free space available 
in the input buffer is only an instantaneous reading. It doesn't 
account for any data in transition and could thus give Ihe 
hosl PC a false reading. 

Command Module 

The command module is responsible for parsing and execut- 
ing SCP (Sleek Command Protocol) commands. SCP provides 
the command protocol for communication between a PPA 
printer and its hosl driver. SCP is a binary language (as op- 
posed to ihe ASCII formatting of the traditional PCL com- 
mand language). The general command syntax is shown in 
Table IV. Some SCP commands are shown in Table V. 

Table IV 
SCP Command Structure 

Command 

Field Field Size Description 

Hi 2 bytes Identifies the command 

Reference 2 bytes Reference number used to cancel 
commands 

Priority 1 byte Order in which the command is 

processed 

Pad l byte Unused 

Length 2 bytes Number of additional data bytes 

Data 0 to n Depending on command, typically 

bytes contains a number of subfields 



Command Parsing. The Parser process requests raw data bytes 
from the I/O module by sending a message. Command bound- 
aries are identified, blocked, and attached to an acquired 
message. Each SCP command is attached to one message. 
This message identifies and leads the command through the 
system for execution. The individual messages are at first 
posted to the Pacer process. 

Command Pacing. The Pacer process receives messages point- 
tog to raw SCP command bytes and soils them according to 
priority. Il continuously selects the highest-priority command 
and posts that command to the appropriate module for exe- 
cution (which could be I/O. mechanism, etc.) The Pacer then 
waits for the command to complete by wailing for a response 
message from Ihe selected executor. 

Commands are sent to the Pacer not only by the Parser but 
also by other modules that may want a command executed. 
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For example, when the printer door is opened, a HANDLE. 
PRINT_CARTRIDGE Change J'rirrLCartridge command is given to 
the Pacer for execution. This command Is issued by tin- 1 I 
module. 

Table V 
Examples of SCP Commands 

Description 

( ia ifi g nre hardware to prim 

a sweep of date 

Load and eject 

Print cartridge change, wipe, 
spit. etc. 

CONFIGURE_PRINT_CARTRIDGE Print cartridge temperatures 



Command 

PRINT SWEEP 

HANDLE.MEDIA 

HANOLE PRINT CARTRIDGE 



STATUS. REQUEST/REPORT 
CONFIGURE_AUTOSTATUS 
CANCEL_COMMAND/DATA 
RESTART 

ECH0_DATA, PERFORMJTEST. 

SETALIGNMENTJNFORMA- 

TION 

ULSTATE. UIJ/IONITOR 
ATOMIC COMMAND 



Sync hronous status 
information 

Asynchronous status 
information 

Flush a command or image 
data 

Reboot printer 
Miscellaneous functions 



User interface set and read 

Low-level manufacturing and 
test command 



Command Execution. A third command moduli' process, the 
Executor, executes SCP commands designated for the Parser, 
(ienerally, S( T commands arc delegated lo their respective 
modules for execul ion. A few commands, such as Ihe 
CANCELCOMMAND command, arc executed by the command 
module itself. 

Mechanism Module 

The mechanism module executes Ihe mechanism-related 
SCP commands, maintains the system's mechanical slate. 



handles periodic print cartridge servicing needs, handles all 
motor needs and functions, anil prints sweeps of data. 

The mechanism module consists of two processes and sev- 
eral interrupt service routines. The topJcvd process, the 
Mechanism State Machine, manages the high-level mechanism 
state (cover open. pa|>er loaded, etc-. ). The low -level process, 
the Walker/Dispatcher, manages the execution of mechanism 
motion scripts called .flews. 

Mechanism State Machine. The Mechanism State Machine is a pro- 
cess that maintains the current mec hanical state of the sys- 
tem, takes the proper actions when state changes occur, and 
returns to previous states after asynchronous state changes 
occur. Fig. 6 shows a small pi uiion of the mechanism state 
machine to give an exmnple of its hierarchical nature. 

The mechanism starts in the Entry state and after inilializa- 
lion proceeds to an Idle state. As a print job comes in. a 
HANDLE_MEDIA: Load_Paper command causes an entry into the 
Load stale, and when paper loaded, to a Ready to Print slate. 
During these states and changes, mechanism flows (or 
scripts) are performed and the stale machine responds to 
asynchronous events suc h as a print cartridge change or 
paper jam. When responding lo asynchronous events, an 
asynchronous state c hange is made and the appropriate 
flows are performed. The state then reverts back to the slate 
that existeil when the asynchronous event occurred. 

Mechanism Flows. Mechanism flows arc small lists of individ- 
ual mechanism instructions, typic ally motor moves, to com- 
plete a high-level mechanical lask. For instance, when start- 
ing Ihe Load Paper state, a load paper How is executed. This 
flow contains a list of individual motor move commands lo 

accomplish the multimotor task of loading paper. 

Flows are written in a custom scripting language. The reason 
for the custom scripting language is lo permit development 
of motor motion without recompiling and building the firm- 
ware SCt A How can be downloaded to the printer ;uid exe- 
cuted, permitting an easy standalone mechanism develop- 
ment environment. This technique is also used during 
manufacturing to invoke CUStOtD manufacturing motor 
movements, When a particular mechanism How has been 
developed, the How can be incorporated into Ihe firmware 
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set and executed jus) as during development Table VI lists a 
few of the available flow commands, 

Table VI 

Partial List of Flow Scripting Commands 
Flow Opcode Parameters 



Carriage Motor Move 

Paper Motor Move 

Wait Carriage Motor Done 

Wait Paper Motor Done 

Jump to Sub Flow 

Goto Flow 

Fan 

Relative Branch 
Exit Flow 



Speed, position 
Speed, distance 



Flow ID 
Flow ID 
On/off 

< ondition. branch distance 



Walker/Dispatcher. The How script executor is a process called 
the Walker/Dispatcher. This process receives messages with 
either addresses of Hows to execute or addresses of comple- 
tion routines to execute. When given an address of a How to 
execute, the Walker/Dispatcher looks up each opcode and calls 
the corresponding (unci ion to perform the opcode. When 
given a completion routine to execute, the Walker/Dispatcher 
executes the completion routine and then retries any opcode 
that had to wait for a completion before continuing. This is 
similar to a microprocessor retrying an instruction after a 
page fault is corrected 

Actors and Gaffers. The func tions that implement the How 
opcodes have been nicknamed actors. There is one actor 
function for each How opcode. A function table is used lo 
select which actor function to execute for each flow opcode 
encountered. 

An actor function parses the flow opcode's parameters, 
verifies thai the particular mechanism resource isn't in use 
(generally a motor), and makes the appropriate call to the 
motor control code to start the proper motor movement. 
If a resource is in use preventing the actor from continuing 
execution, execution of tin- actor terminates and is retried 
when a resource becomes free. 

A completion routine is passed to the motor control code lo 
be executed when the motor has completed motion. These 
completion routines have been nicknamed gaffers. They 
deal with errors during motion, do any final cleanup, and 
cause the script executor to retry an actor function that 
couldn't execute because of a resource limitation. Gaffers 
aren't executed by the motor control code, but rather are 
posted lo the Walker/Dispatcher for execution. 

Motor Control. Motor control is accomplished via a combina- 
tion of process and interrupt threads of execution. Generally 
the execution that occurs in process space would include all 
initial motion and interrupt setup calculations. A transition 
Ls made to the interrupt spar e of a selected hardware inter- 
rupt with a call to lnterrupt_Context(). Once in interrupt space, 
calls can be made to Wait_For_lnterrupt() to effectively suspend 
the execution until the associated interrupt occurs again. 
Kxecution continues, including any additional suspensions 
for additional interrupts, until time to inform the Walker/ 
Dispatcher flow executor of completion. A message is posteil 
lo the Walker/Dispatcher with the address of the appropriate 
completion routine, die gaffer. 



An example of a motor control function using such a combi- 
nation is CM_Move_And_Hold(), which moves the carriage motor 
to a specific location, holds there, and posts the given com- 
pletion routine. CM_Move_And_Hold() is called with a motion 
acceleration profile, a final position, a few other motor ad- 
justments, and a pointer to a completion message. The func- 
tion ilocs some preprocessing lo account for previous motor 
motion errors, to calculate the direction anil distance lo 
travel, anil to select acceleration and slew parameters, The 
transition lo interrupt space is made. The function then goes 
through three loops: one for accelerating, one for slewing, 
and one for decelerating. Each loop calls Wait_ForJnterrupl() 
and sets up the next incremental motion request. Finally, at 
the completion of the motion, the function posts the comple- 
tion message to the Walker/Dispatcher process. 

Configuration RAM 

The HP DeskJet 820C printer has a block of nonvolatile 
RAM that is used for configuring the printer in ways that 
must survive shutdowns. A ('-language structure is used lo 
organize this configuration data. Two small processes read 
and write the data from the nonvolatile RAM and control 
when this must be done. Examples of fields stored in non- 
volatile RAM are shown in Table VII. 



Table VII 

Partial List of Configuration RAM Contents 
Configuration Field Description 

Startup Tests List of startup tests to perforin 

Print Cartridge Stored print cartridge calibration 
Calibration figures 

Page Count Count of how many pages have been 

printed 

Finn Replace- Set of constant replacements 

ments 

Alignment I Mial print cartridge alignment 

adjustment factors 

Mechanism State Indication of whether the mecha- 
nism was properly stored before 
shutdown 

A COpJf of the nonvolatile RAM is kept in normal RAM. This 
copy is made upon startup by the Configuration process. Any 
process can access the configuration data copy as long as its 
access is protected by locking a semaphore. After a process 
has made any change, it must send a SAVE_CONFIGURATION 
message lo the Configuration process. Configuration schedules 
the nonvolatile RAM update by sending a message to the 
NV RAM Process. 

A second process actually reads and writes the nonvolatile 
RAM. This is lo avoid holding up the system, since the physi- 
cal reading and writing of nonvolatile RAM takes time. The 
NV RAM process executes at a very low priority so that non- 
volatile RAM is only updated when there is nothing else to 
do in the system. 
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Power-On/Shutdown Sequencing 

A process known as Psiate is use<l to facilitate a controlled 
startup and shutdown procedure. This is UUpOTtanl to ensure 
that dependenc ies are handled during startup and shutdown. 
To accomplish this, a phased startup or shutdown is used 
During Startup phase I. processes cannot assume that any 
oilier process has had a chance to execute any code. Each 
process initializes only its ow n internal data structures. 
During startup phase 2. processes c;ui assume thai all of the 
other processes have completed their phase 1 code. There is 
no hard and fast rule governing what is to be done at each 
phase. It is simply known that within a given phase, a pro- 
cess can assume thai all other processes have Completed all 
previous phases. Similar procedures are used in shutdown. 

The Startup sequence proceeds OS follows. At Startup, Pstate 
broadcasts to each process desiring startup information the 
START message. Processes indicate they want the startup in - 
fori nation by belonging lo (he Startup class. Included within 
the START message is a phase number. The first time START is 
broadcast, the phase field is set to 1. Once each process has 
responded in ibis first START message, another START message 
is broadcast, this rime with the phase field set to 2, and again, 
each process will respond. Finally, once all phases have been 
completed. Pstaie broadcasts the message START_SEQUENCE_ 
DONE. At this point, all processes can assume that the system 
is operational. 

Internal Test Print 

A small process is used to perform the internal test print 
feature. The Test Print process waits until it is handed the 
D0_TEST message. Il then temporarily disables l/( l and lakes 
over the image input buffer, filling il up w ith test print data. 
To print, this process builds its ow n HANDLE MEDIA Load, 
PRINT^SWEEP. and HANDLE MEDIA: E|ect commands and sends 

them io the command Pacer for execution. Finally I/O buffers 

are restored and I/O reenableil. 
User Interface 

The user interface module, r/i, & designed to respond to 

stimulus of various events happening in the system. A state 
table is used lo map a stimulus lo a particular action and 
subsequent stale. Each slate also includes a set of exit con- 
ditions. The process's main function is lo respond lo Ul EVENT 
messages w hich are posted when front-panel changes occur. 

The primer cover door and buttons generate inlemipls when 

they change. Bach of these has an interrupt sen ice routine 

that takes care of debouncing, using deferred procedure 
calls, and posls a message to the Ul process indicating the 
event change. The Ul process then marches through its state 
table lo make the internal change lo the printer and I lie visual 
change lo the user. 

Printer Stat us 

Printer status is managed by the stains module. This module 
receives update indications from Ihe rest of the system. 
Composes specific Status responses back In the host PC, and 
composes pel indie OUtOStatUS responses back In Ihe host PC. 

Autostatus Table VIII shows a sample of Ihe autOSt&tUS data 
AUtOStatUS is a fixed Structure of bits and numeric fields that 



Table VIII 
Examples of Autostatus Fields 

Field Description 

Misload Paper load failed— most likely out 

of paper 

Door Open Cover door open 

Media .lam Paper jam detected 

Print Cartridge Dual print cartridges not properly 
I naligned aligned 

Last EiTor Code Last error encountered by the 

firmware 

is transmitted back lo Ihe host on a periodic basis. The Auto- 
staius process is responsible for building the transmitted 
dala block and banding it oyer to the I/O module for trans- 
mission to Ihe hOSl 

Status Update. UPDATEJTEM messages are posted in the status 
module tO Update specific fields in the autostatus block. At 
this time additional notification to the resi of the system can 
be made by the status module. For instance, the status mod- 
ule will notify the 171 module of paper misloads, Cover door 
openings, missing print cartridges, and so on. 

Status Responses. The host PC can also request specific in- 
formation from the printer. The Status Request process receives 
status request commands from the Command Pacer module, 
formats the result, and again hands the data over to the I/< I 
module for transmission to the host. 

The Simple Process 

There are small funelions 01 commands that inusl be exe- 
cuted thai don'l really 111 logically into any of Ihe modules in 
Ihe system. Logically for modularity reasons, they might 
each form Iheir own module or process. But in an effort in 
conserve \l< >M and HAM, ihese functions have been com- 
bined into a single process. Table IX shows a partial listing 
of the functions handled by the Simple process. 



Table IX 

Examples of Simple Process Functions and Commands 
Function Description 

SCP Cmd NV RESET Reinitializes nonvolatile RAM 

in default values 

SCP Cmd SET ALIGN INFO Slnres the print cartridge 
alignment information 
received from the host 

SCP Cmd SET PAGE COUNT Stores a new value for the 
page counter 

SCP Cmd REPLACE FIRM Stores a new value for the 

specified linn 

Calibration Functionality Performs periodic calibration 
Functions 
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Flash Memory Support 

To provide for firmware upgrades during development and 
the early stages of manufacturing, Hash memory is tempo- 
rarily substituted for ROM. The Hash memory can he repro- 
grajunied whenever a new firmware set is available. 

The S( P command language provides a command, EXECUTE 
DATA, which causes (he firmware to jump to data downloaded 
into the image buffer. Before making this jump, the printer 
shuts down all inlerrupls to guarantee that none of the exist- 
ing firmware is still executing. To reprogram the flash mem- 
ory the downloaded program contains both the code to re- 
program the Hash memory and the data to program into the 
flash memory. When this downloaded program has completed 
reprogramming the flash memory, it executes a liKOtHI reset 
instruction, effectively returning control back to the flash 
memory and beginning execution of the newly installed 
firmware. 

Conclusion 

The HI' DeskJet 820C printer firmware architecture success- 
fully met or exceeded all cost, quality, schedule, and through- 
put goals. This is particularly satisfying considering the firm- 
ware platform started completely from scratch, with design 
leverage only In the mechanism flow scripting arena- 
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PPA Printer Controller ASIC 
Development 



As the first Printing Performance Architecture printer, the HP DeskJet 
820C needed a completely new digital controller ASIC design. The chip's 
architecture was optimized for the specific requirements of PPA. 
Concurrent development of hardware and firmware through the use of 
hardware emulators and attention to regulatory issues during the design 
helped the product meet all of its requirements on schedule. 

by John L. McWilliams, Leann M. Mac.Millan. Him al Pathak, and Harlan A. Talley 



The Printing Performance Architecture (PPA) used in die 
IIP DeskJet 820C printer is a significant step forward from 
any previous HI' inkjct printer product in providing the 
consumer with a high-performance product at an excellent 
price point. Since PPA redistributes the printing tasks 
between the host and the printer, a complete redesign of the 
digital controller ASK" in the printer was required. This 
redesign effort took into account the overall product con- 
straints of cost and time to market as well as all applicable 
government regulations. The result is a highly integrated 
ASIC thai implements all digital functions performed by the 
HI' DeskJet X20C on a single chip. This high level of integra- 
tion significantly decreased the cost of the electronics in the 
III' DeskJet K20C compared to the previous-generation prod- 
uct while maintaining the printer's performance. This article 
describes the system considerations, engineering decision 
trade offs, and development methodologies that played a 
roll- in the development of I he digital controller ASK ' fen the 
IIP DeskJet 820C. 

The design Of the conl roller ASK" had to be done under 
numerous constraints. As in any consumer-oriented product, 
the foremost consideration during design was the final cost 
to the buyer. The Performance Printer Architecture, as de- 
scribed in the article on page 6, was developed to reduce the 
total cost of the printer. PPA allows several optimizations in 
thi' digital architecture. In today's competitive environment, 
lime to market is nearly as critical a constraint as cost. 
Meeting the time-tO-marke I constraint required concurrent 
development of hardware and firmware and a bug-free ASIC 
at nellist release. These needs were addressed by using hard- 
ware emulators during development. Finally, the printer 
had lo meet or exceed all government regulations including 
those pertaining to EMI and F.SD. Taking these needs into 
account iluring the ASIC design helped the product pass all 

requirements on schedule 

Digital Architecture 

Regardless of their specific type, all printers require several 
pieces o| digital hardware. These pieces include a micropro- 
cessor to control the printer, RAM for data, ROM for firm- 
ware, and CUStOm digital logic for pi inler-specific functions. 



By optimizing each of these pieces, significant cost savings 
were realized in the digital ASIC. 

PPA significantly reduces the cost of the printer by optimally 
partitioning the printing tasks between the software running 
on the host and the hardware and firmware running in the 
printer- The partitioning is done without sacrificing the 
printer's performance. All tasks that can be done on the host 
computer without severely affecting application performance 
are done in the driver. Tasks with real-time constraints are 
performed by the hardware and firmware in the printer. 
Because the host performs the majority of the data manipu- 
lation, data that is sent to the printer is in a format that 
is very close lo the final form used to fire the prinlheads. 
Because of this, the digilal architecture was designed with a 
guiding principle of "I he processor docs not touch the data." 
Once this principle was adopted, the ASIC team was able to 
make several important design decisions. 

First, a relatively low-power processor is all that is needed, 
since the processor does not manipulate the data. Alter sur- 
veying the available microprocessors, the Hi-Mllz version 
of the Motorola ONECOOO was chosen as the best fit. Second, 
since the number of tasks the firmware performs is limited, 
the code size can be kept small enough that a R< >M with all 
firmware can be integrated on the ASIC, eliminating the 
need lor an external Hash memory or R< )M. Third, all data 
manipulations need lo be done in hardware, which limits 
those manipulations to being relatively simple. Finally, the 
memory requirements are limited — a lM-bil DRAM is suffi- 
cient for the data needs. The DRAM holds all firmware vari- 
ables and stacks as well as all pruning data. F.ven with lite 
I IRAM doing double duty, the memory bandwidth require- 
ments of the architecture are fairly low, and the product is 
able lo use a low-cost. lM-bil. nibble-wide DRAM. 

A block diagram of the IIP DeskJet 820(5% digital architec- 
ture is shown in Fig. 1. The digilal electronics consists of 
three main components: the digital ASIC, a lM-bit DRAM, 
and an optional external flash memory or R< >M. The digital 
ASK ' consists of a 68E( '000 microprocessor, a ( i IK-byte 
ROM, a . r )5,000-gate standard cell block, and a IK-byte SRAM 
used as a data cache. In addition lo the external memory 
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Fig. I. HP DeskJet 820C con- 
trailer ASIC block diagram. 



c omponents, the digital ASIC is connected to the I/O con- 
nector (IEEE 1284), the printer motor ASIC, the print head 
ASIC, and an optical encoder which provides carriage 
position informal ion. 

The majority of (he standard cell area is devoted to the data 
path, which is the path the data follows as it moves from the 
I/O connector, through the DRAM and SRAM, and up to the 
pen ASIC. The remaining logic is used for interfacing to the 
microprocessor, for controlling motors, and for keeping 
track of the ciuTent carriage position. AH memories, includ- 
ing registers in the standard cell block, are memory mapped 
into the 68EC0()()'s standard address space. 

Flash or ROM 

The ASIC is designed to be able to read code for the proces- 
sor from one of three sources: a flash memory device, an 
external mask-programmable ROM (MROM). or the internal 
ROM. The reason for the three separate sources is to better 
meet time-to-market constraints. At the beginning of the 
manufacturing ramp, code was stored in flash memory. That 
way, final firmware did not need to be released until just 
before the start of the ramp. As soon as the firmware was 
stable, it was released to bolh the MROM vendor and the 
digital ASIC vendor for programming into the internal ROM. 
However. MROM lead times are much shorter than general- 
purpose ASIC lead times, so MROM pails were available 
much sooner than ASICs wilh property programmed internal 
ROMs. Consequently, printers were built with MROMs for a 
period of time until ASICs with final firmware were available 
( MROMs are about half the cost of flash parts). 



Motor Control 

The IIP DeskJet 820C has three motors: a dc motor for mov- 
ing the carriage across the paper, a Stepper motor for picking 
and advancing the paper, and a second stepper motor for 
controlling the pen service station. The stepper motors are 
controlled in an open-loop process by the firmware. The 
firmware controls a stepper motor move by writing appro- 
priate phase and pulse width data to registers in the ASIC. 
Hardware then generates the appropriate signals for the 
motors. The phase and pulse width data determines the 
direction and speed of the moves. 

The carriage motor is controlled by a firmware-based con- 
trol loop that monitors the carriage position and adjusts I he 
motor control signals appropriately. The carriage position is 
determined through the use of an optical encoder. The opti- 
cal encoder consists of a light emitter-detector pair with a 
plastic- encoder si rip between them. As the carriage moves 
across the paper, the light emitter-detector pair senses thai 
it is mining along the plastic strip, and sends some signals 
to the ASIC. The hardware in the ASIC takes this information 
and uses it to keep track of the current carriage position. 
Using the carriage position, the firmware tracks the car- 
riage's speed and acceleration and adjusts the motor energy 
appropriately. 

PPA I/O Packet Format 

The data from the host comes to the printer in a simple 
packetized format. As shown in Fig. 2. the packets are made 
up of two pieces: header information and data. The header 



32 June 1997 llpwleri-Parkaril .loumal 

©Copr. 1949-1998 Hewlett-Packard Co. 



First Byte 



Oala 



Lasl Byte 



Fig. 2. PPA data packet formal 



information consists of a slart-of-packet (S( >FJ byte, a chan- 
nel byte, and a two-byte data-size field that reflects the 

number of bytes in the data field £0 to 65E), 

In the HP DeskJet 820C, pac kets from the host may contain 
one of two types of informal inn: command data or image 
data. The channel byle determines which type of data is 

contained in the packet (hence, in (he HP DeskJet 820C, the 

channel byle will be one of only two distinct values). Com- 
mand data contains PPA printer control commands, while 
image data contains information that is to be printed on a 
page. The image data that is sent to the printer is In a form 
that resembles a bitmap of the image, and therefore rei|tiires 
a minimum amount of reformatting before being used to Tire 
the printheads. To minimize the amount of data that must be 
sent over the I/" cable, image data is optionally compressed 
before being sent to the printer, 

Data Path 

A block diagram of the data path is shown in Fig. 3, Data 
enters the ASIC through the l/< > cable. Hardware depackel- 
i/.es it and Beparates it into the image and command channels. 
Image data is transferred to one buffer in the DRAM anil 
command data to another, both by DMA < 'oinmand data is 
Consumed by the firmware with no hardware inlerference. 
Image data is moved by the srrrtiul hardware from the 



DRAM to the SRAM. During the move, the image data is 
decompressed if necessary From the SRAM, data is moved 
to a shift register from which it is serially shifted up to the 
carriage board and is used to fire the pens. 

Input/Output 

The standard cell l/< > block implements the low-level hard- 
wan 1 that takes in packets of information from the host via 
the parallel jtort It contains hardware support for IEEE 1284 
compatibility mode and extended capability port (ECP) in 
the forward direction from the host to the printer. The hard- 
ware also supports, with firmware assist, reverse-channel 
nibble mode for sending information back to the host com- 
puter. The I/O block also contains hardware that strips the 
data stream of its packet header information, separates the 
pac kets into command and image data, and sends them to 
the I/< ) DMA block. From the header information, the hard- 
ware checks the slart-of-packet byte to make sure it is the 
correct value, uses the channel byle to select the appropriate 
DMA channel, and uses the size field to determine when lo 
expect a new packet. 

The I/( ) DMA block receives data via the I/O interface and 
stores it into either the Command buffer or the image buffer 
in the DRAM. These buffers are designed as general-purpose 
circular buffers that can reside anywhere in the DRAM 
memory space. The command buffer is emptied by the 
processor as it executes the commands. Image data is con- 
sumed by the servant hardware. As data from the buffers is 
Consumed, the host is notified, via the firmware, of the avail- 
able buffer space, and more data is sent down. This archi- 
tecture allows the printer to make optimal use of its limited 
memory resources. 

DRAM Controller/Arbiter 

The external DRAM is connected 10 its QWD nibble-wide bus. 
Hardware arbitrates accesses lo the DRAM between the 
IA ) DMA hardware, the servant hardware, and the micro- 
processor. 'Hie arbitration method is a combination of priority 
and round-robin schemes. Both the W ) and the sen ant hard- 
ware processes have real-time constraints that dictate the 

maximum length Of time they can be blocked while waiting 
for access lo the DRAM. Although the microprocessor is less 
lightly constrained, it is important that it not be completely 
locked out oflhe DRAM for extended periods of time 
Hence, while each block has a priority for DRAM accesses, 
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the hardware is designed so thai no one unit can hold I he 
DRAM bus continuously. 

In addition to arbitration, t he DRAM controller takes care 
of the low-level interface to the DRAM. It interface's the 4-bit 
DRAM data bus to the 8-bit microprocessor data bus. Using 
fast page mode accesses, it retrieves two nibbles and con- 
catenates them into a byte. It guarantees that the DRAM 
refreshes take place at appropriate times. Although only a 
lM-bit part is used in the HP DeskJet 820C", the controller 
also supports 4M-bit DRAMs. 

Servant 

One of the key contributions of the PPA architecture is that 
it moves much of the pixel processing into the driver. The 
image data sent to the printer is in a format nearly ready to 
be used to fire the pens. The only significant operation that 
is not done by the driver is the operation of picking out the 
individual bits (corresponding to dots on the paper), and 
sending them to the pens in the correct order. The servant 
logic, so named because it serves the pen by providing il 
with pixel data, accomplishes this by loading the data into 
an on-chip cache (the SRAM), and subsequently pulling it 
out at the correct time and in the correct order. The cache 
is divided into sets of swing buffer pairs (one pair for each 
color) such that while data is being taken out of one swing 
buffer by the pixel processor logic, new data is being loaded 
into the other swing buffer by the servant load logic. When 
the pixel processor consumes all the data in one swing buffer 
and switches to the other swing buffer, the servant load 
process begins loading new data into the first buffer. This 
process is depicted in Fig. 4. 

The PPA driver provides the pixel data in swing buffer loads, 
which are chunks of a bitmap eight pixels wide and the same 
height as the printer's pens. The driver provides the swing 
buffer loads in exactly the order required by the pens. The 
servant load process transfers the data by DMA from the 
DRAM to the SRAM as it is needed to fire the pens. During 
the process of moving data from the DRAM to the SRAM, 
the data is decompressed if it was sent over the I/O in a 
compressed format. 



Fig. 4. Swing buffer operation. 
SRAM Arbiter 

The SRAM arbiter arbitrates memory requests between the 
servant load process, the pixel processor, and the micro- 
processor. The arbiter implements a priority-based scheme. 
Since the microprocessor accesses the SRAM only infre- 
quently, it is given the lowest priority. On the other hand, 
data for the pen must be immediately available on demand 
(the carriage cannot be paused for the pen to wait for data). 
Hence, the pixel processor is given the highest priority. The 
servant load process has the middle priority. 

Pixel Processor 

The pixel processor is responsible for placing the bits sent 
to the pens in exactly the order in which they are needed to 
fire the pens. Since the nozzles on the pen are staggered, the 
order in which the bits are needed is not entirely straight- 
forward. As the correct bits are pulled out of SRAM, they are 
placed in a shift register from which they are serially shifted 
to the pen driver [G on the caniage board. 

Each bit sent to the pens corresponds to a nozzle firing or not 
firing. Firing data sits "in the SRAM in byte-wide chunks. Each 
byte corresponds to eight columns of dots on the ptinted 
page. All dots in a column arc fired before beginning to fire 
the next column. Hence each byte in a swing buffer is ac- 
cessed eight times, once for eacli column. After all eight col- 
umns are fired, I he logic switches to the other swing buffer. 

Pen Interface 

This block communicates with the analog pen driv er If over 
a custom serial interface. The pen interface receives data 
from the pixel processing block and shifts it serially to the 
pen driver If. It also generates the liming pulses that the 
pen driver If uses to fire the pens and put ink dots on the 
page. In addition to sending pen firing data, the interface 
sends setup information to the pen driver If to atljust vari- 
ous printing parameters that affect print quality. The serial 
interface is bidirectional, enabling the pen driver If to send 
back information about the pens' status. For example, the 
pen drive If is able to measure temperatures of the pens, 
which are important parameters in thermal inkjet printing. 
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This information is sent to the digital IC and read by the 
firmware. Firmware then uses this information lo adjust 
printing to ensure that the customer will receive optimum 
print (|iiality. 

Because of the staggering of the nozzles on the printhead 
(see Fig. 1 on page 40). for each column of dots on the page, 
the pen must he fired multiple times. The pens must be fired 
every lime a set of vertically aligned nozzles is at the correct 
position on the page. If all nozzles in a dot column are not 
fired ai I he same physical position on the page, that column 
will appear jagged to the customer. Special logic in the chip 
ensures the proper alignment of the dots and therefore opti- 
mum print quality. This logic uses the carriage position as 
determined directly from the optical encoder, which is at a 
relatively low resolution, and interpolates it up to the resolu- 
tion needed to fire the pens. The interpolation is clone by 
phase-lockcd-loop-like logic that measures the lime it lakes 
ilie carriage to move 1/150 inch, and divides this lime down 
lo get the lime it takes the carriage to move a distance equal 
to the nozzle stagger distance. By doing this, the logic is able 
lo issue firing pulses to the pen at the correct time. 

Development Methodology 

The HP DeskJet 820C was developed under some very tight 
tinte-to-market constraints. These constraints dictated that 
the latest CAE fools be used to speed the development 61 
the ASIC. Additionally, the project team wished to have con- 
current design of the hardware and the firmware. This meant 
that the firmware team needed a platform on which to do 
development before the ASIC was finished. To meet this 
need. Aplix hardware emulators were used. 

The HP DeskJet 320C ASIC was designed entirely in Ihe 
Verilog Hardware Description Language (HDL). HDLs are 
computer languages used to describe digital Circuits. They 
contain constructs Ihal allow designers lo describe the func- 
tion of a circuit ral her than Ihe exact gales Ihal are necessary 
to implement thai function. Thus, HDLs allow designers lo 
work at a higher level of abstraction than in Ihe past. Once 
a designer has written Ihe IIDL for a circuit, a compiler 
program can synthesize Ihe IIDL into 8 gale-level design. By 
working at a higher level of abstraction, engineers can greatly 
increase their productivity. The lime required to do Ihe design 
of Ihe HP DeskJet S20C ASIC was significantly decreased 
over pasi products. 

Since designing an ASIC using an IIDL is analogous lo writing 
B piece of Software, il is not surprising Ihal many of the 
practices used by soft ware engineers can he used success- 
fully by hardware teams using HDLs. At the beginning of the 
project, coding conventions were established. Similar struc- 
tures in different designers' modules were coded similarly. 
Designers were encouraged lo comment their code liberall> 
< ode reviews were held during Ihe projeel lo find errors ami 
to improve designers' coding practices. These techniques 
allowed designers lo look at each other's code and quickly 
understand it In addition to having obvious benefits for ihe 
IIP DeskJet 820C project, the good coding practices will 
allow ihe IIP DeskJet 880C hardware lo be easily leveraged 
into future products. 

To synthesize the Verilog code into standard cell gates, Syn- 
opses software was used. This sofi ware allows ihe designer 



to enter information about the design to help the softw-are 
produce an optimum implementation. Synopsys-specifie 
scripts were used to enter the required information. By- 
using scripts, the designers were able to make changes to 
the code, and with minimum effort, synthesize the new im- 
plementation. Just as for the original Verilog code, conven- 
tions and templates for the scripts were developed. As a 
result of these techniques, in a few special cases engineers 
wen- able to modify code written by a different designer and 
synthesize new hardware very efficiently 

\n important part of ASIC design is lest development. The 
HP DeskJet 820C ASIC design team used an HP proprietary 
technology that allowed the engineers to write test vectors 
directly in Verilog. These test vectors were used to verify 
that ihe functionality of the synthesized design matched Un- 
original Verilog. The same vectors were then translated into 
a formal Ihe ASIC tester understood, and used to test the 
finished silicon. Using this technique, a single set of test 
vectors was used throughout the project. In addition to the 
functional test vectors, scan testing was used to achieve the 
desired fault coverage. Since the insertion of scan hardware 
and the creation of scan lesl vectors is done semiautomati- 
c-ally, scan testing was successfully added to the ASIC with- 
out incurring a schedule delay. The use of scan testing did 
increase the cost of Ihe chip because the scan circuitry 
caused I he chip size to grow, but this was deemed accept- 
able when traded off against the time il would have taken 
Ihe designers to write functional test vectors with adequate 
coverage. 

Hardware/Software Codesign 

To meet the overall project goal of a low-cost product, il was 
necessary to make careful trade-offs between the hardware 
and firmware in Ihe product Functions thai are realized in 
hardware cost money because silicon real estate is used. 
Functions realized in firmware cost money because I hey 
require bits in memory, Since the BOM that holds the firm- 
ware in IhC HP DeskJet 820C is integrated into the system 
ASIC, its size had a hard upper limit. Also. Ihe IIP DeskJet 
K20C uses a relatively low-power processor, so Ihe process- 
ing bandwidth available lo perform functions in real lime is 
limited. With the standard cell logic, Ihe processor, and the 
firmware ROM all integrated onto the same chip, optimal 
Irade-offs between the three were espec ially important 

To make the correct trade-offs, Ihe ASIC engineers responsi- 
ble for pan iciilar hardware blocks coordinated closely with 
the firmware engineers responsible for Ihe corresponding 
Firmware blocks. This process allowed Ihe hardware engi- 
neers to gain insight into how Ihe firmware would use ihe 
block, and at Ihe same time allowed the firmware engineers 
lo have a good understanding of Ihe hardware. This mutual 
understanding led to better Irade-offs. The hardware was 
designed wit h just enough functionality to allow the firm- 
ware designer to implement Ihe code within the product 
code size and processor bandwidth constraints, but without 
a lot or extra hardware thrown in "just in case." A secondary 
benefit of lliis approach was that the firmware engineers 
were able to wrile the code for blocks designed lliis way 
with few problems. Code for bloc ks not designed using this 
process (primarily blocks leveraged from prev ious products) 
proved much more problematic to bring up. 
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ASIC Emulation 

To meet the aggressive sc hedule, il was necessary for the 
firmware leam to begin iin[)leinenling the code well before 
ASK s were available I" OBI the code. In fact, firmware implc- 
mentation began before th£ ASIC design was even complete. 
To allow this activity 10 take place, il was necessaiy to set 
up an emulation environment. Traditional methods of doing 
such emulation include building a custom printed cireuil 
board populated with one-time-programmable devices (anti- 
fuse devices, laser programmed parts, etc. | and soft ware- 
based emulation using a previous product. Since the digital 
architecture for this ASIC was a significant departure from 
any previous product, emulation on a previous product 
would have been difficult at best and would have required a 
significant code port when the ASIC became av ailable. ( )ne- 
t hue-programmable devices were not flexible enough to sup- 
port emulation before the design was functionally complete. 
For these reasons, the product team chose to use full-chip 
[C emulators from Aptix to support firm ware development 
before ASICs were av ailable. 

K ' emulators are essentially a large array of SRAM-based 
FPGAs ( field programmable gate arrays) with programmable 
interconnect between them. Software reads in a gate-level 
design for an (C, partitions it into the FPGAs. and then 
creates all the necessaiy files to program the FPGAs and the 
interconnect between them. The emulator then is functionally 
equivalent to the 1< ' that will be fabricated. The emulators are 
highly flexible since they can be reprogrammed by simply 
downloading a new pattern into the SRAMs. The main draw- 
back of the IC emulators is that they generally are not able to 
run at the same speed as the final silicon. For this product, 
the emulator ran at one fourth of the final clock speed. 

Using this approach, the IC team was able to provide the 
firmware team with usable hardware approximately four 
months before silicon arrived. Since the ASIC design was 
not complete at thai point, the first hardware provided was 
only a subset of the full standard cell logic. What was pro- 
vided was enough for the firmware team to begin writing 
and testing the operating system, the code that needed to 
be written first. As more blocks in the IC were completed, 
they were incorporated into the emulation system. 

In addition to providing early hardware to the firmware team 
for development purposes, the use of II ' emulators allowed 
the hardware to be verified in the full printing system with 
actual firmware before being committed to silicon. Because 
the emulators ran at close to the system speed, several 
orders of magnitude more clocks cycles of verification 
occured on the emulators than with software simulation. 
\isii. since it real firmware running, the VSIC «;is pin 
in slates that would have been difficult or impossible to 
achiev e in simulation because of the complexity of getting 
into that state. Finally, many unanticipated hardware/firm- 
ware interactions were discovered. The team was eventually 
able to print with the IC emulators, giving very high confi- 
dence in the functional correctness of the ASIC. 

Thanks to the emulators, two problems in the design were 
discovered and fixed before committing the design to silicon. 
Both problems were system interaction issues that would 
have been very difficult to discover through simulation alone. 
When silicon arrived, firmware was almost immediately 
bootable on it. The only things that needed to be changed 



in the firmware were things that were affected by the differ- 
ence in clock speed, and these had been deliberately coded 
to be easy to change. 

Regulatory Requirements 

Because the IIP DeskJet 82Q< ' printer is sold in the consumer 
marketplace it must meet all applicable consumer electronic 
regulations. Of particular interest to the electronic design of 
the product are the electromagnetic interference (EMS) and 
electrostatic discbarge (KSD) requirements. KM I occurs 
when an electronic product creates an electric field and 
interferes with the correct operation of another electronic 
product. KSD occurs when an object (generally a human) 
that has built up a large static charge discharges to a second 
Object (for example, an IC). In addition to government 
requirements on a product's level of KMI and sensitivity to 
KSD. IIP maintains internal standards, which are generally 
tougher than the government standards. Meeting or exceed- 
ing these internal standards on every product is an important 
aspect of HP's reputation for high-quality, reliable products. 
Since the HP Desk-let 82(11 ' could not legally be released 
without meeting government regulations on KMI and KSD, 
failure to meet them was a significant schedule risk to the 
product. Therefore, both were addressed early during the 
design of the digital ASK '. 

In general, KMI results from improperly controlled high- 
frequency signals that travel a long distance, particularly 
signals that travel over cables. The trick in designing for 
reduced KMI is to control the signals with high-frequency 
content to the greatest possible extent. All I/O pads in the 
IIP DeskJet K20C make use of an IIP proprietary technology 
thai compensates for process, voltage, and temperature 
(PVT) variations in the operating environment of the chip. 
The compensation ensures that the pads have nearly the 
same slew rate regardless of the PVT environment. (Typically, 
pails in an environment that causes the chip to run fast have 
about twice the slew rate of parts in an environment that 
causes the chip to run slow). Of particular concern in this 
product were the digital signals that travel between the main 
logic board and the carriage board. These signals, which are 
in the megahertz range, travel approximately 1!' inches along 
an unshielded and untwisted flex cable. This was deemed 
the most KMI-prone piece of the design. The I/O pads that 
drove these signals were designed to have as slow a slew rate 
as the signal speed would allow. As a backup system, phase- 
locked loop hardware and additional logic to dither the 
system clock or the signals on the Ilex cable was designed 
into the ASIC. Tin- result of this design effort was successful 
passing of the KMI regulations on schedule. 

The other big regulatory threat. KSD. was recognized early 
in the project by both the designers and the ASIC vendor 
(ICBD, a div ision of IIP ). Because KSD events can cause 
damage that will not immediately destroy the chip but 
instead will cause it to fail months or even years later, inade- 
quate KSD protection can result in product reliability and 
customer satisfaction problems down the road. The chip 
needed to be able to withstand ESD events both before and 
after being put on a printed circuit board. When the chip 
is on a printed circuit board, external components can be 
placed around the chip to help protect il. However, each 
component adds cost to the product, so integrating protec- 
tion into the chip results in a cost savings. Also, although the 
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Fig. 5. HPDeskJel 820C digital controller ASIC 



chips will be in a relatively controlled environment before 
b6ing put onto the printed circuit board. ESI) events can and 
will occur, so the chip needs to be able to withstand them. 

When an USD event occurs at the chip pins, a large current 
between the discharge point and ground is induced in the 
ASK'. Most structures internal to the ASIC cannot Withstand 
such a large current without damage. Therefore, a chip de- 
signed to withstand ESD has structures thai can withstand 
the high current, and routes the current to these structures 
rather than to the chip internals. Since an ASIC's only con- 
tact to the outside world is through its pads, BSD protection 



devices an- generally located at the pads. During the design 
of the A-SIC. the ASIC vendor assigned a dedicated engineer 
to evaluate the ESD design. A current-limiting resistor and a 
reverse-biased diode were used in each pail to limit the cur- 
rent that can reach the chip internals. Additionally, shunt 
structures between and ground were carefully designed 
and positioned in the 1C. This early involvement by the ven- 
dor resulted in a solid design that meets IIP ESD requirements. 

Conclusion 

By taking into account the IIP DeskJet 82uC's overall project 
goals of low cost and tunc to market from the start, a well- 
optimized digital controller for the printer was delivered on 
schedule. The chip is specifically designed for Hi's Printing 
Performance Architecture. By integrating all digital func- 
tions in the printer on a single piece of silicon, the cost of 
the electronics in the product was greatly reduced over the 
previous-generation product while maintaining or increasing 
performance. The design team used the latest ASIC develop- 
ment tools to deliver a correctly functioning ASIC on a very 
tight schedule. Through the use of hardware emulators, the 
firmware leant was able to begin coding before the final chip 
design had been released for manufacturing. Further speeding 
the printer's design. All EMI and ESD requirements for the 
product were met on schedule. A lithograph of the final chip 
silicon is shown in Fig. 5. The chip area is approximately 
81 mm-. 
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By integrating the functions of four ICs into one new custom IC and then 
moving all the electronics related to the pens up to the carriage with the 
pens, significant savings were realized. A simple, low-contact-count, 
inexpensive flexible cable is used to connect the carriage to the main 
printed circuit assembly. 

by Huston W. Rice 



The project dean) for the HP DeskJet 8. r )UC printer developed 
the many elements of the printing system in parallel. In par- 
ticular, the print cart ridges (called pnis) were new designs, 
along with the electronics that control them. As a result of 
the |ien.s being new designs, their drive arid control require- 
ments were not completely defined, hut were changing dur- 
ing the development program. The result of this was a sys- 
tem that worked well electrically, but was not fully 
optimized from a cost standpoint. 

In particular, two aspects of the pen drive system presented 
opportunities for significant cost reduction. First, the flexible 
cable connecting the carriage and pens to the main printed 
circuit assembly was very elaborate and fairly expensive. 
Second, the electronics that control the pens were imple- 
mented in four different analog ICs. three of them custom 
ASICs. 

With the advantage of being able to look back on the now 
well-defined system needs, a new approach was selected. 
By integrating the functions of the four ICs into one new 
custom IC and then moving all the electronics related to the 
pens up to the carriage u ith the pens, significant savings 
were realized. The new, highly integrated ASIC" is less expen- 
sive to purchase and to assemble into the product. Since the 
signals are restricted to digital data and raw power, a simple, 
low-contact-count, inexpensive flexible cable is used to con- 
nect the carriage to the main printed circuit assembly. 

For Litis design approach to be successful in the IIP DeskJet 
S20C. several issues had to be overcome. For the greatest 
benefit, all of the electronics associated with the the pens 
had to be contained on the carriage printed circuit assembly. 
Would it all fit'.' Because of a tight schedule and limited me- 
chanical engineering staffing, no mechanical changes could 
be made to the carriage assembly to make more room. An 
additional mechanical constraint was that no components 
could be placed on the bottom half of the printed circuit 
board, which was needed for the connectors for the pens. 
To get the circuits to fit, all the analog IC functions had to 
be integrated into a single ASIC. Could all the different func- 
tions — power control, digital I/O, sensitive analog-to-digital 
measurements, power drivers — be integrated into a single 
device? If all the analog functions from four ICs were in- 
tegrated into one IC. would there be thermal overheating 
issues in the IC? Would there be problems with radiated 



electromagnetic emissions from the digital interface to the 
carriage over a simple unshielded flexible cable? 

To prov ide an aspect of excitement to the program, once 
this approach was chosen, there was no easy alternative to 
fall back upon if the above issues could not be dealt with. 
If this design failed, the whole IIP DeskJet Slit l( ' printer 
program would be put in .jeopardy. 

Carriage Electronics Implementation 

A key architecture change was made in the pen drive and 
control electronics in the HP DeskJet 820C compared to the 
DeskJet S. r >0C. The power supply for the pens was modified 
in two ways. First, two independent dC-tO-dC conveners are 
used lo supply power to the black and color pens in the 
DeskJet 8C>()C. In the DeskJet 820C, a single pen power sup- 
ply is used lo drive both the black and color pens. Second, 
the control topology of the dc-tO-dc converter was changed, 
as explained later in this article. The DeskJet S50C design 
requires seven huge capacitors, two inductors, two power 
FETs, two power diodes, and several small discrete resistors 
and capacitors. All of this was replaced with two capacitors, 
one inductor, one power FET. one power diode, and one 
power resistor. This eliminates not only the need for several 
square inches of printed circuit board space that was not 
available on the DeskJet 820C carriage printed circuit board, 
but also the cost of the unneeiled components. 

The two pens in the product (black and color) must be driven 
at different voltages, and the DeskJet S20C design now only 
has one power supply, which is shared between the pens. 
This forced a change in the way printing is done. In the Desk- 
Jet 850C, both pens can be driven at any time, allowing max- 
imum flexibility in how the printed image can be formed, and 
therefore maximum speed. In the DeskJet 820C, printing 
with the black and color pens alternates. For instance, black 
may be printed from right to left and then color from left to 
right. This difference costs a little in print speed for some 
color documents, but was key in enabling ;dl the electronics 
to fit on the carriage printed circuit assembly. 

Several techniques were used to integrate all the pen elec- 
tronics onto the carriage for the IIP DeskJet 820C. Beyond 
the power supply changes, the next most important step 
was designing a mixed-signal analog/digital/povver ASIC that 
integrates all the functions required to drive and control the 
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pens. The general strategy- was to integrate all the relatively 
small-signal electronic functions into one ASIC to minimize 
the the total component count. This both minimizes the cost 
and uses the minimum printed circuit lx>ard area un the very 
small carriage printed circuit assembly. However, to keep 
the ASIC silicon die area under control and to minimize the 
total power dissipated by the ASIC, several key components 
are not integrated. The power FET and diode for the dc-io-dc 
Converter, both very large devices I from a silicon area point 
of view) are implemented as discrete devices externally. 
Two lid- .11 regulators are also ixnplemented with off-the-shelf 
discrete devices to keep their power dissipation out of the 
ASIC package. Beyond these parts and some discrete capac- 
itors and inductors that caxuiot be integrated, everything else 
is internal to the ASIC. 

The process of developing the ASIC was the most difficult 
aspect of the carriage electronics design. Because of the 
high expected production volumes, at least two independent 
suppliers were needed. In the special mixed-signal/povver l( 
industry, there is considerable process variation from one 
supplier to the next. However, only pin compatibility is 
required between sources. The two ICs do not have to be 
identical. Over a period of about six months, the analog 
ASIC was codeveloped by IIP anil the two suppliers. This 
allowed system design trade-offs to be made to keep both 
ASICs compatible. In addition, the overall program schedule 
demanded that the first pass of this full custom IC had to 
work, because there was only lime for small revisions to the 
device before product ion started on I he HP DeskJet 820C. 
As a result of excellent design teams at the suppliers and the 
careful codevelopmenl communication between HP and the 
suppliers, the first samples of the ASICs worked with only 
a few faults. Simple II" mask changes and the addition of a 
few small external components resulted in the system being 
completed on lime. 

In addition to Ihe mixed-signal ASIC, three printed circuit 
board layout leehnic|iics were used to get all the components 
to 111. First, a il i map of av ailable space for components on 
the circuit board was generated. Willi Ihis, small parts could 
be lucked under mechanical components, anil Ihe larger 
components could be carefully placed to avoid mechanical 
interferences. .Second, Ihe placement of the components was 
very carefully designed to minimize interconnect distances 
and the number of vias required. Third, the classical layout 
placement design rules that govern component spacings 
were pushed or outright violated. Breaking Ihe rules was 
justified because the alternative was changing to a two- 
sided surface mount assembly process, which is a much 
more expensive and unattractive allernative. 

In the end, all the parts were made to fit on Ihe lop side of 
the printed circuit board. An added benefit of the careful 
printed circuit board design is a very low-noise circuit. 
During Ihe development process, we discovered many of Hie 
high-current switching circuits interfered with ihe sensilive 
measurement circuits. The compact layout prov ided signifi- 
cantly better performance than earlier prototype layouts. 

As might lie expected, Ihe low-noise layout is also low-noisc 
from an elect romagnelic radiation poinl of view. Steps taken 
to control ihe slew rales of Ihe digilal signals on Ihe flexible 
cable also proved effective in minimizing radiated emissions 
from the cable. 
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Finally, the stops taken to minimize the power dissipation in 
the mixed-signal ASIC were successful, to the point thai the 
IC Operates at junction temperatures less than 100°C under 
even the most extreme conditions. 

Pen Operation 

The Mack and color pens in Ihe IIP DeskJet BOO family of 
printers operate as a matched pair to deliver high-quality 
color documents. The |>ens themselves are in many senses 
the heart of the printer, and all of the electrical and mechan- 
ical systems are designed to support and optimize their per- 
formance. The electrical systems that drive and control Un- 
pens accomplish two major tasks: they maintain the temper- 
ature of the priutheads to optimize the print quality, and they 
driv e the correct inkjet nozzles at Ihe right times lo print the 
desired image on the paper. 

The viscosity of tin- ink in the pens is sensitive to tempera- 
ture, Blld the size of the drops ejected by the pens is sensi- 
tive tb ink viscosity. By controlling the temperature of tin- 
pens, the viscosity and therefore the drop size can be con- 
trolled. Consistently sized drops provide the best print quality. 
Integrated Into the pens is a temperature sensor, which can 
be used to measure and control Ihe temperature of Ihe pens, 
and therefore the ink viscosity and drop volume. 

In previous generations of inkjet pens, the task of driving 
the pen nozzles has been fairly straightforward. The nozzles 
were arranged into columns on the pens, and every nozzle 
(or firing resistor) was controlled and driven directly. The 
pens in the HP DeskJet 820( ' printer have 300 nozzles (black 
pen) or 192 nozzles (color pen, 64 nozzles for each color). 
This high nozzle count makes it impossible to drive each 
nozzle di reel ly with a dedicated signal and interconnection. 
For these high-nozzle-counl pens, a matrix drive technique 
is used. The method is the same for the black and color 
pens. The matrix drive has two benefits. First, the number 
of connect ions lo Ihe pens is now much lower 22 addresses 
and Hi columns can select 22 x 1-1 =."108 nozzles. The con- 
nection count is 22 + 14 + 14 = 50; (The second 14 cornice - 
lions are for the column ground return currents. ) Second, 
for power reasons, all the nozzles cannol be fired at one 
lime. For each nozzle, a 2.~>-us firing pulse is applied to Ihe 
nozzle resistor that boils the ink and ejects the drop. The 
pulse voltage is about HIV and the current Is 250 mA. If all 
300 nozzles were driven al Ihe same lime, a total current of 
75 \ ;it 10V would have to be available! This is impractical. 
For power reasons alone, all Ihe nozzles cannot be fired al 
once, and the matrix drive provides a convenient way to 
distribute firing the nozzles in time. 

The pen is electrically const rucied as a series of 22 address 
inputs driv ing FETs in the rows of the matrix and 14 primi- 
tive inputs driving the firing resistors in the columns of Ihe 
matrix. Inside the pen are selection FETs for each nozzle 
resistor; these can enable or disable a given nozzle. 

The pen is driven one address row at a lime. Firsi, address 
one is driv en, turning on the select ion FETs for the top row 
of 14 nozzle firing resistors. Any of Ihe 1 1 primitives are then 
driven (all, a few, or none) with Ihe previously menlioiied 
KlV, 2."iO-iuA pulse lo fire the desired nozzles for each column 
associated with row I. Address one is then turned oil' anil 
address two is driven, selecting the next row of FKTs and 
nozzle resistors. The desired primitives are again driven, bul 

JUiw I!>!i7 lli-«l(-li I'iirkaril.loiiniiil 39 

lewlett-Packard Co. 



iliis lime firing nozzles associated with row 2. This process 
is continued through address 22 and I hen repeated. By se- 
quencing through all 22 addresses, pvery one of the 300 
nozzles can he selected and driven. ( Note: 22 addresses 
limes 14 primitives yields :tOK potential nozzles. Since the 
pen only has :(()() nozzles, eight of the Combinations do not 
have a selection FET or nozzle resistor.) 

.Mechanically, the pen nozzles are arranged in a pattern to 
generate proper images, even though all the nozzles are not 
Bred at the same time. The black pen in the III' DeskJet 
S20< ' is capable of (SIKI-dpi printing. The print swath ( the 
hand of ink printed in one pass) is 1/2 inch high, and the 
columns of dots in the swath are 1/G00 inch apart. 

A simple example will illustrate how the nozzles are ar- 
ranged. Suppose we want to print a vertical line. 1/(500 inch 
nine dol ) wide. If the pen were constructed lo lire all die 
nozzles al the same time lo print a vertical column, the 
nozzles would be arranged in a vertical line on 1 1 if pen. For 
power reasons, the nozzles are fired in 22 different groups of 
14 (the 22 addresses and 14 primitives), and I hose are not all 
driven at the same time. Since die pen is continuously moving 
while die nozzles ar e fired, I he desired vertical line would 
come out jagged or sloped if the nozzles were arranged in a 
Straight line on the pen. To get a Straight line on the page, 
the nozzles are staggered lo compensate for the liming dif- 
ferences of the firing (see Fig. 1 ). 

There is one additional complicating factor: The (500-dpi 
black pen has the nozzles arranged in Iwo groups, odd 
nozzles in one column (with some slagger lo compensate for 
the firing timing), and even nozzles in another column about 
4 mm away. Two nozzle columns allow l/:t00-inch spacing 
between nozzles in a given column rather I ban 1/000-ineh. 
making the pen easier to manufacture. 

Now. lo print a vertical, one-dot-wide line on the page, the 
odd nozzles are first driven, one address group al a lime. 
Sbme lime later, after Hie pen/carriage assembly has moved 
4 mm and the even nozzles are over the same location on 
the paper, the even nozzles are driven, one address group al 
a lime. 

The implication is that the data sent to die pens niusi be 
sequenced properly to compensate for the nozzles being 
fired in 22 different address groups and also for the 4-iuni 
odd/even nozzle spacing on the pen. 

The color pen is constructed in a similar manner. Just like 
the blac k pen, the nozzles are fired one address at a time 
and are staggered on the pen lo compensate for this. The 
first 1G of the 22 black address lines are shared between the 
black and color pens, while the color pen has its own Unique 
12 primitive lines. The 10 addresses times 12 primitives are 
sufficient to drive the 192 color nozzles. 04 per color. Like 
the black pen. the nozzles of the three colors are placed in 
dual odd and even columns. Between all the colors and the 
odd and even columns there are six different color nozzle 
placemen! columns, all with small-scale stagger. Therefore, 
for a color document, the sequencing of data is even more 
complex than for black printing. The data has lo be limed 
for the address and row sequential firing, separated for odd 
and even nozzle columns, and tuned to compensate for I he 
displacemenl of the three colors with respect to each other. 
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Fig. i. Black pen nozzla placement. 

The task of sequencing die data for the pen nozzles was 
traditionally handled by the printer digital electronics. In the 
HP DeskJet 820C, the Priming Performance Architecture 
( 1'PA) implementation moves the data sequencing task to 
the host PC and driver, where a loi of dala processing was 
already being done. This relieves the printer of the burden of 
this data sequencing task, and allows it to simply drive the 
nozzles selected by the data coming into the printer. 

Pen Drive Electronics Functions 

The pen control and drive electronics have two key tasks. 
They provide the driving signals to eject the ink from the 
pens, and they provide a temperature control system to 
maintain a constant temperature in the active area of the 
pen. To accomplish this, the overall carriage electronics 
system has the following functions, most of which are inte- 
grated into the mixed-signal ASIC (see Fig. 2): 
Two-way digital interface between the pen chive electronics 
and the main digital controller ASIC in the printer (see ar- 
ticle, page 31). Data is sent to the carriage to control which 
pen nozzles to fire during printing and lo give control com- 
mands for the analog-to-cligital convener (ADC), pen power 
supply, and other circuits. Pen measurements made by the 
mixed-signal ASIC are sent to the digital ASIC. 
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22 address drivers to provide the 12V signals lo mm on the 
FETs inside the pens and select tlie correct row to be driven. 
These drivers are shared by tlie black and color pens. 
26 OObmtn drivers to provide the high-current drive that 
Fines tlie ink out of the pens. 14 drivers are used by the black 
pen and 12 by the color pen. 

A 30VV. programmable dc-to-de converter to provide preci- 
sion power fur the column drive signals. 
Two temperature control systems, one for each pen. This 
consists of an ADC to make calibration measurements on 
the pen temperature sensor. DACs to set target tempera- 
tures for the pens, and control comparators and logic to 
implement I he temperature control system. 
Electronics to measure the pen firing resistors, to determine 
if the pen is damaged. 

Circuitry to provide thermal protection to the analog ASK 
and resetting functions. 

The overall system provides all of the means necessary for 
the digital ASIC and firmware to control the pens, both to 
maintain their target temperature, and to drive specific- 
nozzles to print the images desired by customers. 

DC-to-DC Converter Design 

The dc-to-dc converter thai provides regulated power lo 
drive the pens uses a new digital control technique. The 
feedback control of the regulator is a simple, purely digital 
system. If the voltage is too low, it turns on tlie regulator to 
full power and charges the bulk-storage filter capacitor lo 
the target voltage as fast as possible. If the voltage is high, it 
turns off I he regulator completely. Fig. 3 is a block diagram 
of the converter. 



Fig. 2. Slock diagram of the mixed-signal ASIC 




Pig, a. Blocl diagram of the di 
IimIc converter. 
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'Phis control technique has several advantages: 
The control system is very simple and is easily integrated. 
The control system is inherently very stahle and does not 
require additional compensation components or controlled 
values for the inductor and capacitor. 
The effective bandwidth of the regulator is very high. so il 
responds lo changes in the load Very quickly. Since the load 
placed on the regulator by the pens can change from 0 tfl 3A 
in less than 100 ns, last response lame is useful. 
As a result of the inherent stability and fast bandwidth, the 
size of the hulk storage capacitor could he reduced. Only 
one capacitor is needed in the IIP DeskJet 820C design 
where three identical parts were used in the DeskJet 86QC 
design. This was a key contributor 10 the goal of getting 
everything to lit on the carriage printed circuit board. 
Additional simple digital control functions, such as overcur- 
rent and undervollage shutdowns, were easily integrated. 

Beyond the savings in components, the biggest benefit that 
this control topology presented was design flexibility. The 
definition of the mixed-signal ASIC, which contains the con- 
troller for the regulator, had to he finalized months before 
any testing could be started. By externally generating the 
dock for the regulator in the main digital ASIC under firm- 
ware control, changes could be made to the regulator in 
software, up to the day printer production began. Two of the 
key parameters in the design of any switching power supply 
aic the swilching frequency and the maximum duly cycle. 
By moving the generation of the clock frequency and duty 
cycle to the f ir m war e and digital hardware, the final decision 



on die clock parameters could be delayed until I he system 
v\as carefully tested and analyzed. For instance, as changes 
were made lo Ihe printed circuit board layout, the clock was 
fine-luned to compensate for the differences in performance 
that were seen. The clock can even be dynamically modified 
lo provide regulator behavior lo match the printer operation 
mode al any given lime. 

The net result is very successful. The programmable clock- 
was used to time the regulator performance to match the 
system need after many prototypes had been buill and char- 
acterized. The regulator switches SOW of power from a 
poorly regulated 18V supply to a very well-regulated, pro- 
grammable voltage appropriale for either the black pen or 
the color pen. The regulator only uses about about G cm- 
( ~ 1 in-) of printed circuit board area. 
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The PA 7300LC Microprocessor: A 
Highly Integrated System on a Chip 



A collection of design objectives targeted for low-end systems and the 
legacy of an earlier microprocessor, which was designed for high-volume 
cost-sensitive products, guided the development of the PA 7300LC 
processor. 

by Terry IK Blanchard and Paul G. Tobin 



In the prcx'oss of developing ;i microprocessor, key decisions 
or guiding principles musi be established to set the bound- 
aries l'( 11 all design decisions. These guiding principles are 
developed through analysis of marketing, business, and 
technical requirements. 

Several years ago. we determined thai we could best meet 
the needs of higher-volume and more cosl -sensitive products 
by developing a different set of CPUs tuned to the special 
requirements of these low-end. midrange systems, The 
PA 7100LC was the fust processor in litis line, which con- 
tinues with the PA 7300LC. 

This article will review the guiding principles used during the 
development of the PA 7.'lt)0LC microprocessor, A brief over- 
view of the chip will also be given. Tile other PA 7300L* ' 
articles included in this issue will describe the technical 
contributions of the PA T-'lOOLl' in detail. 

Design Object i\ «-s 

Although the PA 7300LC was targeted for low-end systems, 
cost, performance, power, and other design objectives were 
all given high priority. With the design objectives for the 
PA T-'SOOI.C we wanted to: 

I iplimize lot entry-level through midrange high-volume 
systems (workstations and servers) 
Prov ide exceptional system price and performance 
Roughly double the performance of the PA 7100LI 
Provide a high lev el of integration and ease of system design 
Provide a highly configurable and scalable system for a 
broad range of system configurations 
Tune for real-world applications and needs, not just bench- 
marks 

Emphasize quality, reliability, and manufacturabfiity 
Provide powerful, low-cost graphics capabilities for technical 
workstations 

I se the mature IIP CM< >S1 1C :{.:!-voll <)..->-um process 
I 'se mainstream, high-volume, and low-cost technologies 
while still prov iding the necessary performance increases 
Emphasize lime to market through the appropriate leverage 
of features from previous CPUs. 

Meeting Design Goals 

We began by leveraging the superscalar processor core 
found in the PA 7I00LC processor. First we investigated 
the value of high integration. Next we added a very large 



embedded primary cache, now feasible with the tl.'i-pm 
technology. Then we enhanced the ( 'PI ' core to take advan- 
tage of the new on-chip cache by reducing pipeline stalls. 
We also ensured high manufacturing yields by adding cache 
redundancy. 

We found that integration supported our design goals in 
many positive ways. Because the primary cache, the second- 
ary cache controller, and the DRAM controller could be on 
I he same chip (see Fig. 1). we had an opportunity to design 
and optimize them together as a single subsystem. This was 
a large factor in allowing us to achieve such an aggressive 
system price and performance point. The high-integration 
approach also yielded much simpler system design options 
for our system partners. To further support these partners, 
we designed the integrated DRAM, levels cache, and I/O 
bus controller with extensive eoiifigurabilit.v I sec "( onfigur- 
abjlitj of the PA 7300LC" on page 15). This configurability 
enabled a wide variety of system options ranging from 
compact and low-cost systems to much more expandable, 
industrial-strength systems. 

We were Careful not to lake a cost-first approach to this 
design. We believe that performance is just as important for 
Customers "I HP's lower-cost systems. We took a total sys- 
tem approach in optimizing performance while emphasizing 
application performance over benchmarks in making design 
tradeoffs. The highly optimized memory hierarchy shows 
dramatic improvement for (he memory-intensive programs 
found in technical and commercial markets. 

Another way of meeting our performance goals was lo push 
the frequency while increasing the level of integration. We 
focused early on the layout and floor plan of I he chip lo 
enable higher-frequency operation. Through this effort, all 
Critical paths Were optimized. We tracked anil optimized 
62,000 Individual liming paths during the design phase. 

Despite leveraging the design from an existing ( IT. the 
PA 7.10()I,C design team still evaluated a large array of tech- 
nical features and alternatives to meet our performance 
goals. Fundamentally, our approach was lo build a robust 
CPU using a simple, efficient microarchitecture. Such a 
design ran less risk of functional bugs and allowed physical 
designers more leeway lo push their circuits for higher 
performance. 
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Fig. L PA 7300LC system design. 



On-Chip Primary Cache Decisions 

It was clear from the beginning t hat die CMOS14C process 
would allow an on-chip cache of reasonable size, so a signif- 
icant investigation was done to determine an optimal cache 
size and configuration. HP's System Performance Lab in 
Cupertino, California assisted us by repeatedly running 
benchmarks and code traces with different cache topologies 
and memory latencies. 

Optimal Cache Size. Finding a balance between instruction- 
cache and data-cache sizes was difficult, The PA 7300LC 
was intended for use in both technical markets, where 
larger data caches are desired, and commercial markets, 
where programs favor large instruction caches. The stan- 
dard industry benchmarks can easily fool designers into 
using smaller instruction caches, trading the space for more 
data cache or simply keeping the caches small to increase 
the chip's frequency. HP has always designed computer 
systems to perform well on large customer applications, so 
we included them in our analysis. Ultimately, we found that 
equally sized caches scaled extremely well with larger code 
and data sets. The typical performance degradation found 
when a program begins missing cache was mitigated by 
large cache sizes and our extremely fast memory system. 

We could physically fit 128K bytes of cache on the die. so it 
was split into 64K bytes for the instruction cache and 64K 
bytes for the data cache. Not only would this provide 
impressive performance, but we noted that it would be 
the largest on-chip cache of any microprocessor when it 
began shipping. 

Cache Associativity. Car-he associativity was another issue. 
Recent PA-RISC implementations have used very large di- 
rectly mapped (off-chip) caches. Associativity would reduce 
the potential for thrashing in the relatively small 64K-byte 
caches, but we were worried about adding a critical tuning 



path to the physical design — selecting the right Way* of 
associativity and multiplexing data to the cache outputs. 
Increasing the ways of associativity would further reduce 
the thrashing, but make the liming even worse. The Systems 
Performance Lab included associativity in I heir performance 
simulations, helping us arrive at our decision to implement 
two-way caches. To reduce the impact on timing, we elimi- 
nated cache address hashing, which had been used to re- 
duce thrashing in directly mapped cache designs. Once we 
added associativity, hashing was no longer necessary. 

Associative cache designs also need an algorithm for deter- 
mining which way to update on a cache fill. Again, there are 
many alternatives, but our simulations showed the easiest 
approach to be Ihe best. A pointer simply toggles on each 
fill, so that the ways alternate. 

Other Cache Decisions 

Many other cache decisions fell out of the same types of 
analysis. The data cache uses a copy-back rather than a 
write-through design** and a 256-bit path to the memory 
controller was included for single-cycle writes of copyout 
lines as shown in Fig. 2. 

Moving the caches onto the chip also simplified changing 
the CPU pipeline to remove the "store-tail" penalty, in which 
stores on consecutive cycles cause a hang. This made it 
easier for compilers to optimize code. 

" Way. ot N-way associativity, is a technique used to view a single physical cache as N equally 
sized logical subcaches the PA 7300LC caches are two-way associative, so each 64K-byte 
cache has two ways of 32K bytes each This provides two possible locations for any cached 
memory data, reducing the thrashing that can occur in a direct-mapped cache when two 
memory relerences ate vying lor the same location 

* In a write-through cache design, data is written to both the cache and mam memory on a 
write In a copy-back cache design, data is written to the cache only, and is written to main 
memory only when necessary. 
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Configurability of the PA 7300LC 



Rather than choosing a single, inflexible memory and level-2 cache con- 
figuration, we architected the PA 7300LC so that system designers can 
make price and performance trade-offs themselves Most of the choices 
available to designers are in the memory system 

Bus Frequencies 

The PA 7300LC GSC (general system connect) bus interface supports 
several CPU GSC frequency ratios GSC frequencies at or near the bus 
maximum frequency of 40 MHz can be maintained even when the CPU 
is running at noninteger multiples of the bus frequency (e.g.. 132 MHzl 

Memory Interface 

The memory interface can be designed with either 64-bil or 128-bil 
(72-bit or 144-bit with error correction) data paths A maximum of 16 
memory banks is supported, and each bank can hold from 8M bytes to 
512M bytes of DRAM The DRAM technology can be either FPM (fast 



page model or EDO (extended data out), with chip sizes from 4M bits to 
256M bits A broad range of DRAM speeds is allowed, as DRAM timing 
can be software programmed using a nine-element MIOC (memory and 
I/O controller) timing vector. 

Memory error correction is optional Single-bit correct and double-bti and 
tour-bis burst error detection schemes are available, all with sufficient 
error logging for system diagnosis and program data protection 

Level-2 Cache 

The level-2 cache is completely optional Three types of SRAM ate sup- 
ported regisier-to-register. flow-through, and asynchronous Depending 
on the SRAM speed and CPU frequency, level-2 cache latencies of two. 
three, or four CPU cycles can be programmed into the MIOC Parity error 
protection on the SRAM data is also optional. 



Adding Spare Columns to the Cache Arrays. Mnnufaciurahility 
is a big concern for large VLSI Memory Structures like the 
PA 73GQLCs caches. Dense, regular structures like cache 
BAM cells are very susceptible l«> the smallest manufacturing 
delects, aiul.jusi one failing bit out of 1,200,992 can make a 
part useless. To compensate, the cache design team added 
Spare columns to the Cache arrays. During the initial wafer 
test of a CPl" die. an internal built-in self-test (BIST) routine 
runs lo check for errors. If a bad RAM cell is found, the MIST 
signal m e indicates which column should be swapped out. 
and a laser is used lo blow a special metal fuse on the chip. 
The bad column is replaced with the spare, fully restoring 
the chip's functionality. The article on page lil describes this 
feature in del ail. 

Integrated Memory and I/O Controller Decisions Incorporating 
Che memory and VO controller (MIOC) onto the PA 7100LC 
chip was an important performance win, and we worked to 
make il even better on the PA 7300LC, Simply having the 
MM «' and CPU on the same die is extremely eflicienl. 
An off-chip MM )( ' would require a chip crossing lor each 
data request and dala return. Chip crossings are time- 
consuming, costing many chip cycles at 1 00 MHz. Since 



the GPU stalls on a critical request; chip crossings directly 
degrade performance. 

Chip crossings also require additional pins on packages, driv- 
ing up the cost. As a result . designers strive to keep external 
data paths narrow. With the MIOC on-cliip. we were able to 
use wider dala pal lis liberally for faster transfers. We placed 
some of the MIOCs buffers inside the cache and used wider 
dala palhs to create a bus that is one cache line wide for 
blasting cache copyouts to the MM >C in one cycle. 

Cost and Performance Decisions I ICspiie all Hie performance 
enhancements, the increased CPU frequency placed a bur- 
den on the MM )C in minimize memory latencies and pipeline 
Stalls because of filled request queues. Blocking for an off- 
chip resource costs more CPl' cycles at higher frequencies, 
so il was paramount thai the PA 7300LC MIOC be fast and 
eflicienl. The challenge was in achieving this withoUl signili- 
canily increasing Hie sysiem cost. 

I knibling the external memory data path to 12s biis was a 

clear performance advantage, but il also increased sysiem 
cost. Adding 72 (64 data • 8 error correction) pins lo the 
CPl ' die and package came at a price. We were concerned 
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that system designers would also be forced to create more 
expensive memory designs. Configurability was the best 
solution. The increased performance warranted adding pins 
to the CPU, but the MIOC was designed to support a 64-bit 
mode for less expensive memory designs in low-cost systems. 

Off-Chip Second-Level Cache Performance. In addition to the 
primary cache, one of the PA 7300LC's most intriguing fea- 
tures is its second-level cache (see Fig. 3). Even with the 
MIOC's very fast memory accesses, it takes at least 14 CPU 
cycles for cache miss data to be returned. While this is 
exeellenl by industry standards, we had the opportunity 
to make it even faster by implementing an off-chip second- 
level cache. 

In many cases, the CPU is stalled during the entire memory 
access. A typical second-level external cache design could 
drastically reduce the number of stall cycles, but would be 
expensiv e. The engineering pros and cons were debated, 
and a very interesting solution was found. Address pins for a 
second-level cache were added to the CPU. but the second- 
level cache and DRAMs share the memory data lines (either 
64 or 128). Very fast FET switches are used to shield second- 
level cache accesses from die heavy DRAM line loads until it 
is determined that the second-level cache will miss. While 
adding one cycle to memory accesses, this technique re- 
duces access time to only six cycles on a second-level cache 
hit. The second-level cache is optional for low-cost systems 
or for those applications where a second-level cache is not 
beneficial. 

MIOC Design Enhancements. Internally, the MIOC design was 
enhanced in many areas in the PA 7100LC MIOC. The inter- 
nal pipeline was split into independent queues for memory 
and I/O, preventing memory stalls during long I/O operations. 
Reads can be promoted ahead of memory writes to satisfy 



CPU requests rapidly, and graphics writes are accelerated 
ahead of other transactions to increase graphics bandwidth. 
Finally, the GSC (general system connect) interface was en- 
hanced to Improve graplucs bandwidth by well over 200% 
over the PA 7100LC ami to support a broader range of 
CPU.GSC operating ratios. 

CPU Core Decisions 

Removing the Phase-Locked Loop. Because of its higher oper- 
ating frequency, the original PA 7300LC design contained 
a phase-locked loop circuit to synthesize both CPU and 
system clocks. Designing a phase-locked loop in a digital 
CMOS process is challenging and historically has affected 
yield and robustness in VLSI designs. When an inexpensive 
external clock part was found, we decided to recover the 
phase-locked loop circuit area and reduce technical risk by 
removing it. 

Integer and Data Cache Controller Enhancements. The on-chip 
caches caused both the integer and data cache controllers to 
be redesigned, and significant enhancements were included 
in both. The data cache controller added a deeper store 
buffer, and by also modifying the instruction pipeline, we 
were able to eliminate completely the store-tail problem 
mentioned earlier. Also, memory data is bypassed directly to 
execution units before error correction, with later notifica- 
tion in the rare event of a memory bit error. 

The instruction cache cont roller expanded the instruction 
lookaside buffer (ILAB ) from one ent ry to four, and im- 
proved the performance of bypassing instructions directly 
from the MIOC to the execution units. Both are very tightly 
coupled to the MIOC so that memory transfers to and from 
the caches are extremely fast. 

' The GSC is the local bus that is designed lo provide maximum bandwidlh lor memoiy-lo- 
graphics transfers. 
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Summary 

We developed a set of guiding principles based u|>on market- 
ing business, and technical requirements for this system. 
The guiding principles enabled the design of an exceptional 
microprocessor targeted to the volume and price/perfor- 
mance requirements of the workstation and server market. 
A large part of the overall success of this design comes from 
the well-engineered cache and memory' hierarchy. The tech- 
nology wp chose allowed us to develop a high-capacity 
primary cache and a rich set of performance-improving 
features. 

The PA 7-100I.C design met its schedule and exceeded its 
performance goals. Customers are receiving PA 7300LC- 
based systems today. 
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Functional Design of the HP 
PA 7300LC Processor 



Microarchitecture design, with attention to optimizing specific areas of 
the CPU and memory and I/O subsystems, is key to meeting the cost and 
performance goals of a processor targeted for midrange and low-end 
computer systems. 



by Leith Johnson and Stephen R. Undy 



The PA 7300LC microprocessor is the latest In a series of 
32-bit PA-RISC processors designed by Hewlett-Pac kard. 
Uke its predecessor, the PA 7100LC, 1 ' 2 the PA 7300LC 
design focused on optimizing price and performance. We 
worked toward achieving the best performance possible 
within the cost Structures consistent with midrange and 
low-end systems. This paper describes the microarchitec- 
ture of the two main components of the PA 73001,0: the 
CPU core and the memory and I/O controller (MIOC). 

CPU Core Microarchitecture Design 

Approximately one-half of the engineering effort on the 
PA 7300LC processor was dedicated to the design of the 
CPU core. The CPU core includes integer execution imils, 
floating-point execution units, register Hies, a translation 
lookaside buffer (TLB), and instruction and data caches. 
Fig. 1 shows a block diagram of the CPU core. 

Core Design Objectives 

The design objectives for the PA 7300LC processor were to 
provide the best possible performance while choosing the 



proper set of features that would enable a system COSl ap- 
propriate for entry-level and high-volume workstation prod- 
ucts. To reach this goal, we integrated large primary caches 
on the processor chip and developed a light coupling be- 
tween the CPU core and the memory and I/O subsystems. 
The design objectives for the PA 73001,0 are discussed in 
detail in the article on page 43. 

CPU Core Differences 

The PA 730OLC CPU core is derived from the PA 7100LC CPU 
design. 1 - Although the PA 7300LO has many similarities 
with its predecessor, there are some key differences in the 
design that allowed us to meet our performance objectives. 
The first difference is that the PA 7300LO runs at 1G0 MHz 
compared to only 100 MHz for the PA 7100LC. The most 
obvious difference is the large primary instruction and data 
caches integrated directly onto the PA 7300LC chip. The 
PA 7100LC only has a small ( 1 K-byleJ instruction cache on 
the chip. Also, the organization of the caches was changed 
to avoid many of the stall cycles that occur on the PA 7100LO. 
The cache organization is discussed later in this article. The 
PA 7300LC has a 96-entry TLB, compared to 64 entries on 
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Fig. 2. The PA 73QQLC pipeline diagram. 

the PA 7100LC. Finally, the PA 7300LC has a four-entry 
instruction lookaside buffer (TLAB) while the PA TlOOLC's 
ILAB contains one entry. 

Pipeline and Execution Units 

Like all high-performance microprocessors, the PA 7300LC 
is pipelined. What is notable about the PA 7300LC pipeline 
is that it is relatively short at six stages, while running at 
160 MHz. 

Fig. 2 shows a diagram of the PA 7300LC pipeline. It does 
not differ great ly from the pipelines used in die PA 7200, 
PA 7100LC. or PA 7100 processors. 1 - 3 ^ The following opera- 
tions are performed in each stage of the pipeline shown in 
Fig. 2: 

L Instruction addresses are generated in the P stage of the 
pipeline. 

2. The instruction cache is accessed during the F stage. 

3. The instructions fetched are distributed to the execution 
units during the first half of the I stage. During the second 
half of the I stage, the instructions are decoded and the 
appropriate general registers are read. 

4. The integer units generate their results on the first half 
of I he B stage. Memory references, such as load and store 
instructions, also generate their target address during the 
first half of the B stage. 

5. Load and store data is transferred bet ween the execution 
units and the data cache on the second half of the A stage. 

(5. The general registers are set on the second half of the 
R stage. 

Superscalar Processor. The I'A 73001.C i.-, a superscalar pro- 
cessor, capable of executing two instructions per pipeline 
stage. This allows it, at 1(50 MHz, to execute at a maximum 
rate of 320 million instructions per second. This, however, 
is a peak rate that is rarely achieved on real applications. 
The actual average value varies with the application run. 
The theoretical maximum assumes the proper mix of in- 
structions, but not every pair of instructions can be bundled 
together for execution in a single cycle. Fig. 3 shows which 
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pairs of instructions can be bundled for execution in a single 
pipeline stage. 

Delayed Branching. The PA-RISC architecture includes delayed 
branching/' That is, a branch instruction will not cause the 
program counter to change to the branch address until after 
the following instruction is fetched. Because of this, branches 
predicted correctly with a simple branch prediction scheme 
execute without any pipeline stalls. The majority of the re- 
maining branches execute with only a single stall (see Fig. 4). 
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(lop : Floating-Point Compulation 

Idw : Simple Load Word 

srw : Simple Store Word 

Idst : All Other Loads and Stores 

tlex : Integer ALU 

mm : Shirrs 

nul : Can Cause Nullification 
br : Branches 


' Instructions will combine if they 
reference two diderent words in 
the same double word. 

X Valid Superscalar Combinations 



Fig. 3. Valid Supersi iiliir instruct inn combinations fur PA 73001.0. 
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Fig. 4. Branch behavior, (a) Correctly predicted branch, 
fb) Incorrectly predicted branch. 



Two Integer Execution Units. The PA 7300LC contains two 
integer execution units. Each contains an ALU (arithmetic 
logic unit) that handles adds, subtracts, and bitwise logic 
operations. Only one unit, however, contains a shifter for 
handling the bit extract and deposit instructions defined in 
the PA-RISC architecture. Since only one adder is used to 
calculate branch targets, only one execution unit can pro- 
cess branch instructions. This same unit also contains the 
logic necessary to calculate nullification conditions.* By 
limiting execution to only one branch or nullifying instruction 
per pipeline stage, we avoided a great deal of functional com- 
plexity. Finally, only one unit contains the logic to generate 
memory addresses. Since the data cache is single-ported, 
I here is no need to have two memory addresses generated 
per cycle. In special cases, however, two integer load or 
store instructions may be bundled together, provided the} 
use the same double-word address. As mentioned before, 
these asymmetries between the integer units prevent any 
two arbitrary integer instructions from bmidling together. 
However, even with this limitation, compilers are able to 
take advantage of the integer superscalar capabilities of 
the PA 7300LC. 

Multimedia Instructions. The PA 7300LC integer units imple- 
ment a set of instructions first introduced on the PA 7100LC 
that accelerate multimedia applications. 1 - 1 ' These instruc- 
tions allow each integer unit to perform two 16-bit adds, 
subtracts, averages, or shift -and-adds each cycle. Because 
of superscalar execution, the PA 7800LC can execute four 
of these operations per cycle for a peak rate of (540 million 
operations per second. 

Floating-Point Unit. The PA 7300LC contains one floating-point 
unit. Contained in this unit is a floating-point adder and a 
floating-point multiplier. The adder takes two cycles to cal- 
culate a single- or double-precision result. It is pipelined so 
that it can begin a new add every cycle. The multiplier takes 
two cycles to produce a single-precision result and three 

The PA-RISC architecture enables certain instructions to conditionally nullify or cancel the 
operation of the following instruction based on the results of the current calculation or 
comparison 



cycles for a double-precision result. It can begin a new 
single-precision multiply every cycle and a new double- 
precision multiply every other cycle. Divides and square 
roots stall the CPU until a result is produced. It takes eight 
cycles for single-precision and 15 cycles for double-precision 
operations. 

Instruction Cache and 1 1. A IS 

Integrating a large primary instruction cache onto the pro- 
cessor chip broke new ground for PA-RISC microprocess- 
ors. In the past, our processor designs relied on large exter- 
nal primary caches. With the PA 7300LC, we felt that we 
could finally integrate enough cache memory on the proces- 
sor chip to allow fast execution of real-World applications. 
Indeed, we have integrated twice as much cache on-chip as 
the PA 7100LC used externally in the HP 9000 Model 712/60 
workstation (i.e., 128K bytes versus 64K bytes). The inte- 
grated cache not only improves performance but also 
reduces system cost, since an external cache is no longer 
mandatory. 

Primary Instruction Cache. The PA 7300LC primary instruction 
cache holds 64K bytes of data and has a two- way set associ- 
(ltiur organization. A set associative cache configuration is 
difficult to achieve with an external cache, but much more 
practical with an integrated cache. When compared to a 
similarly sized directly mapped cache, it perforins belter 
because of higher use and fewer collisions. We chose a two- 
way associative cache over other ways to save overhead 
caused by the replication of comparators and to reduce the 
propagation delay through the way multiplexer. 

The primary instruction cache is virtually indexed and physi- 
cally tagged. Because the PA-RISC architecture restricts 
aliasing** to 1 M-byte boundaries, we could use a portion of 
the virtual address (in this case, three bits) to form the index 
used to address the cache. To avoid using virtual address 
bits would have required us either to place the virtual-to- 
physical translation in series with cache access (increasing 
the cache latency) or to implement a large number of ways 
of associativity ( in the case of a 64K-byte cache, this would 
have required a 16-way set associative organization). 

Data Array Requirements. The instruction cache is composed 
of a tag array and a data array, each containing addresses and 
instructions. Without using more wires or sense amplifiers 
than those found in a conventional cache organization, we 
organized the data arrays in an unusual fashion in the pri- 
mary caches on the PA 7300LC to meet two requirements. 

The first requirement is for the instruction cache to supply 
two instructions per cycle to the execution units. Because 
the cache is two-way set associative, each location, or set, 
contains instructions corresponding to two distinct physical 
addresses. Thus, for any given set (determined by the in- 
struction fetch address), there are two possible choices for 
the instructions being read. Each of these choices is called a 
group (see Fig. 5a). For speed reasons, both groups are read 
from the instruction data arrays simultaneously. Logic that 
compares the physical addresses in the tag arrays (one per 
group) with the physical address being fetched from the data 

" * Aliasing refers to intentionally allowing two different virtual addresses to map to the same 
physical addtess The PA-RISC atchilectuie restricts the number and location ol bits that may 
differ between two virtual addresses 



50 Jims WOT Hewlett-Packard Journal 

©Copr. 1949-1998 Hewlett-Packard Co. 



Address^!. Addresjill] 




AddressllO] 



Group 0 Array 




Group 1 Array 


Upper Lower 




Upper 


Lower 










H I 1 1 < 1 5 


Set*] 




m 


2 i 3 | 6 | 7 


MU+1] 








^- Cache Line 








AddressllO] 



Address|0:9) 



10 



Lett(11] 



AddressllO] 



Ibl 

Address|0.9] 



Leh[11] 



(c) 



Lett Array 




Right Array 



Upper Lower 



Sel[«l 
Sel[» + 11 









H 






2 


3 


6 











Right[11] 



AddressllO] 



Hit Compare 



Lett Array 




Right Array 



Upper Lower 

Setlx] 

Set[x + 1] 2 | 3 6 I 7 



□ 



M 



\ 




Right[11] 



Fig. 5. (a) Conventional two-cachr- 
organization. (I).) Checkerboard in 
slrut-Mon fetch, Jc) Checkerboard 
Group i Ml line access. 



Group 0 



© Copr. 1949-1998 Hewlett-Packard Co. 



June l!«l7llowli'tl-l'.-irkanl.loiinuU 51 



array determines which group is selected ;uid sent In die 
rest of the CPU. Since litis is lite normal instruction fetch 
operation, it must lie completed in a single processor cycle. 

The second requirement is to i>e able to write eight instruc- 
tions simultaneously, all to the same group. Because a write 
occurs as pai l of die cache miss sequence, il is important 
that the write take only a single cycle to interrupt instruction 
lei dies as link- as possible. 

Fig. 5a shows the conventional method of addressing dala 
arrays. Because of electrical and layout considerations, the 
upper four instructions of each eight-instniction-long cache 
line are kept in a separate array from the lower four instruc- 
tions. Both the upper and lower arrays are addressed and 
read coneurrenlly. There are four arrays in total: group 0 
upper, group (I lower, group 1 upper, and group 1 lower. 
The instruction fetch address senl to the instruction 

cache. Address[0:1 1|, contains twelve bits, I >nc address bit, 
Address(tO). selects heiween llie upper and lower arrays. The 
resl of the address hits, Address[09) and Addresslll], go to all 
four arrays and del ermine which set I Setlx]) is read out of the 
arrays. This is accomplished with the ll-to-20 18 decoders. 
In reality, four decoders, one for each array, would lie 
needed, hill Ihey all conned to the same address. As dis- 
cussed above, there are two possible pairs of instructions to 
choose from wilh a given address. A signal from logic called 

hit compare selects between the two possibilities, In the 

example shown in Fig. 5a. instructions 0 and 1 from group 0 
are selected from the instruction cache. 

Thfe conventional approach nieels our firs! requirement. 
However, il doesnol meet our second requirement II cannot 
access all eight instructions as a single group simultaneously. 
This is because a cache line is located in two adjacent sels 
and only half of the line can be read (or more important. 
Written) at any one time. For exiunple. if the group II upper 
array is supplying instructions 0 and 1. it obviously cannot 
supply 2 ajid 3: The only way to solve I his problem wilh the 
conventional approach is to splil each array into Iwo halves. 
This, however, would require twice as many wires and possi- 
bly sense amplifiers producing a sizable increase in area cost 
By making a slight modification to the way the data arrays 
are organized and addressed, we found we could avoid this 
pitfall and Steel both of our requirements. 

( >i if addressing approach on the PA 7300LC is called checker- 
boarding. Fig. 5b shows how instructions are fetched from 
the instruction Cache on the PA 7;?()0LC. There are. again, 



four arrays: lefl upper, liefi lower, right upper, and righl 
lower. The most significant address lines. Address|0:9), go to 
all four arrays, while Left(ll| goes only to the Iwo lefl arrays 
and Right|ll| goes only to llie the two righl arrays. A single 
address bil, AddressMO). selects between the upper and lower 
arrays, as before. 

When an instruction is fetched, both Lett|ll| and Right! 1 1 1 are 
set to the value of Addresslll], Because of this, the operation 
is virtually identical to the conventional approach described 
above, except for one key difference: a cache line for a 
given group is spread across all four arrays, rather than jnsi 
two. This can be seen in Fig. 5b, where llie instructions 
corresponding to group 1 have been shaded, Kach arrav con- 
tains pieces of Cache lines from both groups in a checker- 
board fashion. 

Fig. 5c illustrates how checkerboarding allows simultaneous 

access lo an enlire cache line. By selling Left|ll| to the group 
desired and Right|ll| to llie opposite value, all eight instruc- 
tions from one group can be read or written. In the example 
shown in the figure, an enlire cache line from group 1 is read 
out. LeftMH is set high, while Rightlll) is set low. Address[0:9] 
Selects which pair Of Sets, Setlx] and Setix-1], are accessed. 
Fig. (i lisis the results of addressing Che arrays with the 
various combinations of values on Left|11| and Rightlll]. 

Instruction Cache Hit Stages. The CPI c ine w ill attempl to 
fetch 8 pair of instructions from the instruction cache every 
cycle during which il is not stalled. For example: 

• The instruction fetch address arrives at the instruction 
cache al the end Of the I' stage of the pipeline. 

• On the firs! half Of the F Stage, the word line decoders fire 
one word line lo each array. 

• On the second half of the F stage, the array is read, driving 
its value onto the bil lines to the sense amplifiers. The way 
miiliiplexer then selects the proper pair of instructions from 
(he sense amplifier outputs. 

• On the first half Of the I Stage, the instructions are driven to 
the execution units for decoding and execution. 

Instruction Cache Miss Stages. In llie case of an instruction 
cache miss, which is known by the end of llie F stage of the 
pipeline, the enlire pipeline will stall. A read request for an 
enlire cache line will then be senl to the memory controller. 
This request is called a copyin request. A 64-bit data path 
between the memory controller and the instruction cache 
requires a minimum of four cycles to transfer the entire 
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Timing Flexibility 



Microprocessor design is a lime-constrmmg and expensive process 
Ideally, a design should scale through several fabrication process genera- 
tions with low-investment algorithmic artwork shrunk to help amortize 
the cost of the original design 

Although it is relatively straightforward to increase the processor fre- 
quency, the frequency of interconnect to the rest of the system is more 
or less fixed Typically the base processor design has the capability fo* 
a range of core-processor-frequency-to-interconnect-frequency ratios 

The PA 7300LC has three interfaces that are tolerant of increases in the 
processor frequency the I/O bus interface, the mam memory interface, 
and the second-level cache interface 

The cycle time of the general system connect IGSC) I/O bus can be con- 
figured to some multiple of the processor's cycle time The I/O controller 
supports ratios from three to nine The second-level cache controller can 
be configured to support a variable number of CPU cycles per second- 
level cache cycle. The controller supports two. three, or four CPU cycles 
per cache cycle Similarly, the main memory controller can configure the 



setup and hold times of the DRAMs to be two. three, or four CPU cycles 
Additionally, seven key DRAM timing parameters can be individually 
programmed 

As the processor gets faster, performance may improve but only as a 
sutilmeat function of processor frequency since memory and I/O perfor- 
mance remain constant The large fust-level caches on the PA 7300LC 
help insulate the processor from the effects of the relatively slow mem- 
ory accesses, allowing the performance to scale well with increasing 
core processor frequency The initial frequency target for the PA 7300LC 
was 132 MHi, but design ratios support core processor frequencies up 
to 360 MH2 

Two additional benefits are derived from the timing flexibility of the 
PA 7300LC The increasing availability of higher-speed DRAMs and 
SRAMs makes it a simple matter to configure the tuning generators 
to take advantage of these new components Also, timing flexibility 
decouples the design effort from uncertainties that develop as RAM 
component vendors traverse their own development cycles. 



cache line to the Instruction cache. Rjut cycles are required 
because the memory controller can only deliver 64 bits per 
cycle and a cache line contains 256 bits. The memory con- 
troller will return the pair of instructions originally intended 
to lie fetched first, regardless of the pair's position within 
the cache line. As each pair of insi ructions is returned from 
memory, it is written into a w rite buffer. The instructions can 
be fetched directly from this buffer before they are Written 
to I he cache, with the first pair's arrival causing the pipeline 
to resume execution. This capability is commonly referred 
to as si nw hi iiif/. In effect, the write buffer forms a third way 
of associativity. After the last pair of instructions arrive from 
memory, the write buffer contents are written to the cache 
in one cycle. 

Unified Translation Lookup Table Sinn- the instruction letch 
address is a virtual address, ii must be mapped Into a corre- 
sponding physical address at the same time the instruction 
cache arrays are being accessed. Normally, a full instruction 
translation lookaside buffer, or ITI.il. is used to perform this 
function. ( In the PA 7800LI '. as on all recent I'A-IJISI ' proces- 
sors, we fell thai the peifonnancc improvements achieyed 

With a separate ITLIi and HTI.H (foi dala accesses) did not 
warrant Ihe increased chip area costs, instead, we opted 
for a unified TI.H thai performs both instruction and dala 
translations, 

Instruction Lookaside Buffer (ILABI Because both an instruc- 
tion and a dala translation are rci|iiired on many cycles, a 

smaller structure called an instruction lookaside buffer, oi 

ll.AH. is used lo translate instruction addresses, while the 
larger unified TI.il is free lo translate data addresses. The 
four-entry (LAB is a subset ofthe unified TLH and contains 
Ihe most recently used Iranslalions. This strategy is quite 
effective because instruction addresses, unlike dala ad- 
dresses, tend to be highly correlated in space in thai they 
generally access the same page, a prev ious page, or Ihe nexl 
page. 

When an instruction address does miss the ll.AH, normally 

because of a branch, the pipeline will stall to transfer the 



desired translation from the unified TLB to ihe ILAB. We 
designed in two features lo mitigate these fXAfi stalls. On 
branch instructions lhai are not bundled with a memory 
access instruction (such as a load or store), the unified TLB 
will be accessed In parallel with the ILAB. in anticipation of 
an ILAB miss. If Ihe ILAB misses, the normal (LAB Stall pen- 
ally w ill be reduced. The second fealure we added was ILAB 
prefel clung. Every lime the CPI' begins execuling on a new 
instruction page, the TLB will lake ihe opportunity lo trans- 
fer Ihe translation for the next page Into the (LAB. This can 
completely avoid the ILAB misses associated with sequential 
code execution. 

Data Cache and TLB 

We designed ihe dala cache array to be very similar lo the 

instruction cache arrays. Like the instruction cache, the data 

cache is two-way set associative, virtually indexed, and 
physically lagged. Il is composed of three arrays: 

• A dala array, which has the same checkerboard organi/alion 
as Ihe Instruction cache dala array 

• A lag array, which is almost identical lo ils instruction 

cache counterpart 

• A dirty hit array, which has no counterpart in Ihe instruction 
cache. This array keeps track i if \s hel her a dala cache- line 
has been modified by 'he instruction si ream. 

Although organized in a way similar lo Ihe instruction cache. 
Ihe dala cache's internal operation anil effect on llu- CPI ' 

pipeline are quite different The daia cache and TLB operate 

in the A and B stages of the pipeline. A load instruction 
causes a dala address lo be generated in Ihe first half of Ihe 
B siage. 'fhe data cache word line decoders operate on Ihe 
second half of Ihe B Stage. 1 »n Ihe lirsl half of the A slatfc. 
ihe arrays drive Iheir Values OUt Based on llu- comparison 
between Ihe physical address and Ihe OUtpUl of Ihe lag 

arrays, ihe way multiplexer then selects ihe proper data 
value. This word or double-word value is then driven lo the 
integer and floaiinnpoint units during the second half of Ihe 
A stage. 
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Fi>{. 7. The Slot* CJUOUC- 

A store insl ruction generates a data address in the same 
manner as a load instruction. Thai address is used to read 
from the tag array as described above. Instead of using the 
store address to read from the data ar ray, however, ihe ad- 
dress from Ihe head of a two-entry Store queue (Fig. 7) is 
used to index irttO the data array on Ihe second half of Ihe 
13 stage. The dala from the head of Ihe store queue is written 
into the data array on the first half of the A slage. The dala 
from the store insl ruction is driven from the integer or 
floating-point units to Ihe data cache on the second half of 
the A stage where il is written into Ihe tail Of the store queue. 

Store Queue. A load can retrieve dala directlj out of the store 
queue if it is to the same address as Ihe slore. The necessity 
of Ihe store queue is twofold: 

'The floatingpoint unit cannot drive store data in time to 
write the data array during Ihe proper pipeline stage. The 
store queue, therefore, provides the lime to transfer the 
data from the execution units to the data cache. 
Memory cannot be modified until il is known thai Ihe store 
instruction will properly finish execution, tf the store in* 
struction is going to irap. say. because of a TLB fault, any 
architected state, such as memory, must nol be changed. 

The disposition of Ihe slore insl ruction is not known until 
the R slage of Ihe pipeline, well after the data array is lo be 
written. The slore queue serves as a temporary buffer lo 
hold pending store dala. If a store thai writes into Ihe slore 
qUetie Subsequently traps, thai store queue entry is merely 
invalidated. Also, by using a store queue, we are able to use 
a single bidirectional bus to transfer data between the cxe- 
culion units and ihe dala cache. The slore queue allows dala 
to be transferred on the second half of the A pipeline slage 
for both load and slore instructions, preventing conflicts 
between adjacent loads and stores in Ihe instruction stream. 

Semaphore Instructions. The dala cache performs oilier mem- 
ory operations besides load and store instructions. It handles 
semaphore instructions, which in the PA-RISC architecture 
require a memory localion to be read while thai location is 
simultaneously zeroed. In operation, a semaphore is quite 
similar to a store instruction With zeroed dala, except that 
the semaphore read data is transferred on ihe second half of 
Ihe A slage. In cases in which the semaphore is not present 
or modified in the data cache, the load and dear operation 
must be performed by the memory controller. 

Flush and Purge Instructions. We musl also execute Hush in- 
siruciions, which cause a given memory localion lo be casl 



out of the dala cache. Related is Ihe purge instruction, which 
al Ihe musl privileged level causes a memory localion to be 
invalidated in Ihe dala cache With no casl out, even if the 
line is modified. 

Reducing Miss Latency. I lata c ache misses are detected on 
the firsl half of Ihe A stage of Ihe pipeline. To reduce miss 
latency. Ihe physical address being read from Ihe dala cache 
is forwarded to the memory controller before the dala cache 
hil-or-miss disposition is known. This address is driven lo 
Ihe memory controller on Ihe first half of the A stage. A "use 
address" signal is driven to the memory controller on Ihe 
firsl half of Ihe R slage if a cache miss occurs. 

Copyin Transaction. A number of transaction types are sup 
polled between the CPU core and the memory controller. 
The most common type is a copyin transaction. 

After receiving a copyin request, the memory controller re- 
turns the requested cache line, one double word at a time. 
As with insiniciion misses, the memory controller returns 
the data double word that was originally intended to be 
fetched first. 

On load misses, when Ihe critical double word arrives, il is 
sent directly to the execution units for posting into the regis- 
ter files. Oil integer load misses, the critical data is bypassed 
before error correction lo reduce latency even further. 

In the extremely rare event thai the data contains an error, 
the ( 'PI is prevented from using the bad data and forced to 
wail for corrected data. As each double word arrives from 
Ihe memory controller, it is placed into a copyin buffer. 

When all the data has arrived, Ihe contents of Ihe copyin 
buffer are written to the data cache dala array in a single 
cycle. There are actually two COpyirt buffers to ensure thai 
Iwo dala cache misses can be handled simultaneously. 

Fig. S shows a block diagram of Ihe copyin and copyoul 

buffers. 

Copyout Transaction. A data cache line can contain modified 
dala requiring posting or writing back lo memory when cast 
out. To this end. another transaction type is implemented — 
a copyoul transaction. A copyoul is necessary under two 
Circumstances. The fiist case is when a dala cache miss is 
detected and (he existing cache line selected for replacement 
has been modified. This is the most common case. 

The second case is when a Hush insiniciion is executed and 
hits a modified line in the dala cache. The dala cache sup- 
plies both a physical address and 32 bytes of data on a copy- 
oul. The data cache uses the checkerboard organization, so 
the full cache line read for Ihe copyoul takes only one- cycle. 

Reducing Cache Miss Penalties In the PA 7300LC, we have 
taken a number of steps lo reduce Ihe penally caused by 
cache misses. As mentioned above, we have reduced cache 
miss latencies. We have also continued to adopt a *stall-on- 
use" load miss policy pioneered on earlier PA-RISC designs, 11 
In Ibis policy a load miss stalls the CPC pipeline only long 
enough to issue die copyin transaction and possibly a copy- 
out transaction. In many cases, the delay lasts for only one 
cycle. The CPC w ill then only stall when Ihe largel register 
of the load insiniciion is subsequently referenced. If the 
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critical data returns from memory fast enough, the pipeline 
will not stall at all. 

Because memory data is not needed by the CPl I on a store 
miss, the CPU only stalls once, again for only one cycle in 
many cases, to issue the copyin and copyout transactions. 

A scoreboard keeps track of which words have been slored 
so that the copyin write will not overwrite more recent data. 
Since high-bandwidth writes to I/O space can be critical 
to graphics performance, under most circumstances the 
PA 7300LC will not stall on a store to I/O space. This opti- 
mization is possible because an I/O space access is guaran- 
teed to miss the data cache, so I here is no need to stall the 
CPU to perform a copyout read. 

Cache Hints. The PA-RISC architecture defines cache hints lo 
allow the programmer or compiler to communicate informa- 
tion that can be used by I he hardware to improve perfor- 
mance.'' We have implemented two of these hints on the 
PA 7:tO0LC: 

• Block copy store. Hints are used lo Indicate that soft ware 
intends to write an entire cache line. In this case, there is 
no need io perform a mailt memory read on a cache miss. 
With I his hint specified, upon delecting a store miss, the 
PA 7300LC simply zeros out a copyin buffer and continues 
without issuing a copyin transaction. 



' Coherent operation semaphore hint. This optimization im- 
proves semaphore performance by not forcing I he load and 
clear operation to the memory unit if the data is present in 
the cache. 

TLB Access. All memory reference instructions are guaranteed 
access to the unified TLB containing both Instruction and 
data translations, during the B and A stages of the pipeline. 
Tile TLB is fully associative and contains !-M3 page transla- 
tions. The TLB receives a virtual data address on Die first 
half of the B stage and drives a translated physical address 
on the first half of the A stage. This physical address goes to 
the data cache to perform hit comparison ;utd to the memory 
controller in anticipation of a data cache miss. 

In addition to containing 96 page entries, each of which 
maps to a 4K-byte page, the TLB also contains eight block 
eniries used to map larger memory ranges. These block 
entries are managed by the operating system. 

CPU Summary 

.Ml hough the CPU core of the PA 7.300LC is not dramatically 
different from its predecessors, several noteworthy features 
thai improve performance and allow more cost-effective 
system designs include: 
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> A simple pipeline and a capable superscalar core ihai 
increased our operating fre<|iieney to Kit) MHz. 

i Substantial primary caches integrated directly onto die 
processor chip 

> Most important, cache controllers that take advantage of 
integrated caches, resulting in features designed into the 
CPU core to increase the competitiveness of PA 7300LX ' 
based systems. 

Memory and I/O Controller Design 

The memory and l/( > controller ( MIO(') is responsible for 
interfacing the ('PI ' core to the memory and l/< ) subsystems. 
Integrating the Ml< )(' on the same chip as the ( PI I core pro- 
vides a tight coupling that results in outstanding memory 
and I/( ) performance. The memory controller includes a 
main memory controller and a controller for an optional 
second-level cache. The I/O controller interfaces the CPU 
core to MP's general system conned (GSC) I/O bus and han- 
dles dir eel memory access (DMA ) requests from I/( ) devices. 

CPU to MUX Interface 

The ("PI ' core transmits four basic types of request to the 
MUX : 

Copying. These requests occur during first-level cache 
misses and are used by the CPU core to read a cache line 
from t he memory subsystem. 

I opyouts. A copyout is a cache line from the CPU core that 
must be written to the memory subsystem because it was 
modified in the first-level cache by a store instruction. 
Cop.vouls are only issued when a modified cache line is 
replaced or Hushed from the first-level cache. 
I iK ached loads and stores. An uncached load or store 
request is a read or write to either memory or I/O for an 
amount of data that is less than a cache hue. 
Load-and-elears. This request is an indivisible request to 
read a location and then clear it. This operation is needed 
to implement PA-RISC's semaphore mechanism. Requests 
thai have addresses located in memory address space are 
processed by the memory controller, and all others are sent 
to the l/( ) controller. 

The PA 7300LC has a four-entry copyout buffer, Copyouts 
are posted to memory as a background operation, allowing 
copyins to be processed before copyouts. New copyin re- 
quests are checked for conflict within the copyout buffer. 
If there is no conflict, the copyin is processed before all 
copyouts to help minimize load use stalls. 

Second-Level Cache Control 

Even though first-level caches on the PA 730GLC are rela- 
tively large for integrated caches, many applications have 
data sets that are loo big to fit into them. The second-level 
cache (SLC) implemented for the PA 7300LC helps solve this 
problem. Logically, the SLC appears as a high-speed memory 
buffer: other than its performance Improvement, it is trans- 
parent to software. The SLC is physically indexed, is write- 
through, has unified insl ructions and data, and is direct 
mapped. 

The SLC becomes active after an access misses the first- 
level cache. The first-level cache miss indication becomes 
available after the TLB delivers the real address. As a result, 
there is little advantage to virtually indexing the SLC and 



real indexing avoids the aliasing problems associated With 
virtual caches. 

Multiway Associative Cache Comparison Multiwaj associative 
caches enjoy better hit rales because of fewer collisions. 
However, multiway caches are slower because of way selec- 
tion, and for a given cache si/.e. are much more expensive 
to implement with industry-standard components. For most 
applications, it is more advantageous to trade off size for 
ways of associativity 

Write-Back Cache Comparison Write-back caches* generally 
have better performance than write-through caches. How- 
ever, sharing the data bus with main memory alters lliis situ- 
ation, li the SLC were write-back, lines copied oul of the 
SLC would have to be read into Ihe PA 7300LC, the error- 
correcling code (ECC) would have lo be computed, and Ihe 
line would have lo be written back lo main memory. This 
operation would be quite expensiv e in terms of bus band- 
width- Instead, dirty lines cast oul by Ihe first-level cache 
are written to the SLC and to main memory simultaneously. 

Any valid line in Ihe SLC always has the same data as its 
corresponding location in main memory. Writing Simulta- 
neously lo main memory and to the SLC is slightly slower 
Hum simply writing lo ihe SLC SHAM, but produces a good 
performance and complexity trade-off when compared to a 
write-back design. 

DMA Interface. DMA reads and writes from I/O devices are 
typically sequential and do not exhibit the access locality 
patterns typical of CPU traffic. Entering DMA t raffic into the 
SLC lends 10 pollute the SLC with ineffective entries. Instead, 
buffering and prefetching inside ihe DMA interface are better 
ways of improving DMA performance. To maintain consis- 
tency, an SLC check cycle is run for DMA writes, and if it 
hits. Ihe line is marked invalid. DMA write data is always 
written to main memory and DMA reads are always satisfied 
from main memory, Because of ihe write-through design of 
the SLC described above, data in Ihe SLC never becomes 
stale. 

SRAM Components. The PA 7300LC is optimized for both 
price and performance. Relatively early in the design pro- 
cess, il became necessary lo select the sialic random access 
memory (SRAM) components used to build the SLC. SRAM 
components are frequently used in cache construction be- 
cause Ihey offer high speed with moderate cost and capaci- 
ties. Given Ihe relatively long design cycles necessary to 
produce a complex microprocessor and Ihe uncertainties of 
the semiconductor marketplace, it was impossible tO predict 
which components would be most attractive from a price 
and performance perspective when the PA 7300LC entered 
full production. Instead of selecting a single component, the 
decision was made lo support a broad range of SRAM types. 
This allowed Component selection to be made lale in Ihe 
development cycle and even upgraded al some point during 
Ihe production phase. 

Second-Level Cache Size Most popular computer benchmark 
programs have relatively small working sets anil are not 
particularly sensitive to the performance of ihe memory 
system beyond the first-level cache. < In Ihe oilier hand. 

' In a wrrte-hack cache design lalso called copy hackl, data is written nniy to the cache on 
a write, and is not wnlten to mam memory until the cache line is invalidated In a wme 
thmugh cache design data is Airmen to noth the cache aod mam memory on a wine 
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application programs have widely variable working set 
sizes. Some are small and fit well in the first-level cache 
and some exhibit little reference locality and don't fit in any 
reasonably sized cache. Hence, no single SLC size is appro- 
priate. The PA 7:HH)LC SLC controller supports cache sizes 
ranging from 2:">»">K bytes up to lUM bytes. Although a (UM- 
byte SLC" is expensive, it might Ik> cost-effective for some 
applications. 

The SLC data array width can be programmed to either (H or 
128 bits phis Optional ECC. However, the width must match 
the width of main memory. 

Memory Arrays. The SLC consists el two memory arrays ItM 
data array and the tag array. The data array shares the data 
bus with main memory. As an option, BCG bits can be added 
tO the data array, and the full single-bit correct and double- 
hit detect error control invoked for SLC reads. The tag array 
includes a single optional parity bit If parity is enabled and 
had parity is detected on a lag access, an SLC miss is sig- 
naled, the failing lag and address are logged, and a machine 

check is signaled 
Main Memory Control 

DRAMs Dynamic random access memory I DHAMl technol- 
ogy is used tO construe! main memory because of its high 
density, low cost, and reasonable performance levels. The 
main memory controller supports industry-standard DRAMs 
from 4 Ml tit to J">iiM-bii capacities. Systems can have up to 
Hi slots and total memory can be up to -l.7-><; bytes, the 
maximum possible with the PA-RISC 1.1 arch it eel ure. 

Data Bus Width Dala bus width c;ui be either til or 12H hits 
plus optional ECC. The 128-bit data bus width significantly 
improves memory performance. The til-bit option supports 
lower-cost systems. 



Main Memory Controller. The PA 7-MttiLI main memory con- 
troller is very flexible and is able to support most types of 
asynchronous DRAMs. The controller is intentionally not 
SIMMTHMM (single or double inline memory module) spe- 
cific. This allows use of the PA 7:KH)LC in a wide variety of 
system configurations. The main memory c an support ex- 
tended dala "ill ( EIM ») I »RAMs. which are similar to other 
DRAMs hut use a slightly modified protocol that pipelines 
the- cblttmh access. 

Fig. !i shows the liming diagrams of read accesses, emphasiz- 
ing the improved data bandwidth of EIX > DRAMs compared 
to Standard page-mode DRAMs. 

Error-Correcting Code. The state of DRAM memory cells is 
Susceptible to corruption from incident energetic atomic 
particles. Because of this, the PA 730OLC main memory con- 
troller optionally generates and checks an error-correcting 
code. The code is generated over a 64-bil data word. Any 
single-bii error within the 644ri1 data word can be corrected. 
\li double-oil errors and all three- or four-bit errors within an 
aligned nibble Can be detected. The aligned nibble capability 
is useful since memory systems are typically built with four- 
bit-wide I tRA.Ms. The nibble mode capability allows defection 
Of the catastrophic failure of a single four-bit-wide DRAM. 
Whenever an error is detected, dala and address logging 
registers are activated to support efficient fault isolation and 
rapid field repair. 

Shared SLC and Main Memory Data Bus 

From a cost perspective, ii was desirable to share Ihe huge 
data buses needed for the SLC and main memory, Ihereby 
lowering the pin couni of the PA 7300LC However, sharing 
the large load bom main memory DRAM cards would have 
significantly impacted the speed of SLC operations. The 
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Fig. 10. PA 73001.1 " block diagram showing the position of the FET switch. 



solution to this problem resulted in using an RET switch to 
isolate the main memory load from the SLG bus when the 
SLC is driving the bus, but to allow the bus to be shared 
when main memory is being accessed (see Fig. 10). The FET 
switch is a relatively inexpensive industry-standard part, 
which has a propagation delay of less than 1 ns when in (he 
on stal e. 

FET Switch, The FET switch also enabled us to connect the 
PA 7300LC to legacy 5-volt DRAM cards. The PA 7300LC 
operates at 3.3 volts and is not tolerant of the 5-volt logic 
swing of many existing DRAM cards. Biasing the gate of the 
FET switch to a voltage lower than 6 volts effectively limits 
the voltage swing from DRAM cards to 3.3 volts when seen 
by the PA 7300LC. 

Chip Layout Challenges 

Although the MIOC is a small part of the PA 7300LC, it con- 
trols nearly all of the I/O pins. Because the pins are located 
at the chip perimeter, long signal routes from the MIOC to 
some pins are unavoidable. Separating the MIOC into several 
blocks that could he placet! near the chip perimeter and con- 
trolled remotely helped manage this problem. In particular, 
the data flow across the shared SLC and main memory data 
bus Ls completely predictable ( because there are no slave 
handshakes from the memories), making the memory data 
interface the ideal block to be controlled from the other side 
of the chip. 

Cache Miss Data Flow 

The MIOC is highly optimized for satisfying CPU cache 
misses. Although DMA transaction processing is handled 
efficiently, system performance is more sensitive to CPU 
cache miss performance than DMA performance. 



When idle, the Sl.t ' and main memory controllers pass 
through physical addresses that are coming directly from 
the TLB and going to the SLC and main memory address 
pads. On the cycle following each address, the CPU core 
indicates whether that address resulted in a miss in the first- 
level cache. If a miss occurred, then an access is initiated 
and a cycle is saved by having passed along the physical 
addresses to the SLC and main memory. 

For copyins, the SLC begins an access. The tag and data 
array are accessed in parallel. If there is an SLC bit, then 
data is returned to the processor. 

On an SLC miss, the SLC data array data drivers are disabled, 
the FET switch is closed, and control is transferred to the 
main memory controller. 

When a transaction is received by the main memory control- 
ler, it endeavors to activate the correct DRAM page. This may 
be as simple as issuing a row address strobe (RAS) with the 
proper row address, or may require deasserting RAS, pre- 
charge, and a new RAS. The memory controller sequences up 
to the point at which it is ready to issue a column address 
Strobe (CAS) command, waits there until the SLC misses, and 
switches control over to complete the CAS command. How- 
ever, if the SLC hits, it will wait for the next transaction and 
Start the cycle again. Performance is improved by starting 
the DRAM access in parallel with the SLC access. 

hi the case of an SLC miss, once the main memory controller 
has control, it issues the proper number of CAS cycles to 
read the data As the data passes the SLC. it is latched into 
the SLC data array. At the end of the cycle, the FET switch is 
opened, the SLC drivers are enabled, and the next transaction 
is processed. 



58 June IS07 Hewlett-Packard Journal 

©Copr. 1949-1998 Hewlett-Packard Co. 



Reducing Low-Miss Latencies. Much of the work described 
above concerns icducmg miss latencies Tliis is iiii|n>n;uil 
lutaiise even though Ihe I'A 7:(1HH/' CM ' core has a non- 
blocking cache, load use sialLs slill <levelop quickly for inan.v 
instruction sequences. Low-miss latencies minimize the 
impact of these stalls, w hich results in better overall |>erfor- 
mancc. At CPU clock rates of MHz. the PA 7300LC. as 
seen hy the ( IT pipeline. Ls capable of SIX" hit latencies 
of three cycles with industry -slanilanl Iv-ns asynchronous 
SHAM. Main memory latencies can be as low as 13 cyeles 
with Mt-ns DRAM. Many single-cycle latency reductions 
have been implemented in the PA 7U00LC; each by itself 
would not have much impact on overall memory access la- 
tency, but taken together. Ihey make a significant difference. 

I/O Interface 

Tlii' I'A 7:i(lULC contains interface logic that allows direct 
connection to HP's high-speed GSG i/o bus. This interface 
processes I ( ) requests from the < PI " core and DMA re- 
quests from GSC I/O bus devices. 

Programmed I/O. Programmed I/O allows load and store in- 
struct tons from the (PI " core to communicate with the L/O 
subsystem. Prom a performance perspective, programmed 
l/( • writes to graphics devices are important for many work- 
station applications. The improvements made for graphics 
performance in the PA 7300LC are described later in this 
article 

DMA Interlace Controller The DMA interlace controller is 
designed to minimize main memory controller traffic anil 
to reduce DMA read latency. The DMA interlace controller 
employs three ;J2-byte line buffers. When servicing any DMA 
read, the controller requests 32 bytes from main memory 
and puis the data into one of Ihe buffers. DMA requests on 
Ihe (JSC bus may be I. S. Hi. or -12 bytes long. Since most 
I >MA requests an- 10 sequential addresses, requests less 

than 32 bytes can probably he satisfied from data contained 
in the buffer without issuing another request IO Ihe main 
cnemorj controller. The I ima controller is also able to pre- 
felch Ihe next seqiienlial line Of informal ion lo increase Ihe 
chances thai DMA read requests are serviced from Ihe DMA 
buffers. 

GSC Write Requests U rites are collected by ihe DMA hard- 
ware and passed on to the main memory controller, GSC 
w rile requests ol :>J. byles ale selil directly to Ihe controller, 
bill when possible, smaller sized writes are collected inlo 
32-byte chunks by the I >MA controller to allow Ihe mam 
memory controller to access memory more efficiently. 

Improvements for Graphics Applications 

i iraphics performance depends on many aspects of the 
system design, in addition, graphics workloads are sensitive 
lo the system architecture. For the I'A 7.'I00I,C. we chose to 

optimize the design for engineering graphics, where the typi- 
cal workload involves rendering an object to the display 

<lc\ II e 

Prom a high-level point ofvlew, the process of rendering an 

object can be divided into three sleps: 

I. Travel sum die display lisl that describes Ihe object 

:1 I Hipping, scaling, and rotaling Ihe object wilh Ihe Current 

v lewpoini 



:5, Transforming the object from primitive elements, such as 
polygons, into pixels on the screen. 

This process can be partitioned in different ways. Willi 
today s powerful ( Pi s. the most cost-effective method is to 
store ihe display lisl in the computer systems main memory 
The host CPU performs the display lisl traversal and Ihe 
clipping- scaling, and rotation steps, and then passes primi- 
tiv es to dedicated graphics hardware for conversion into 
onscreen pixels 

Graphics Requirements Several different models, including 
specialized CPU instructions and DMA engines, have been 
used to extract data to be rendered from main memory. 
While these approaches work, they incur Ihe undesirable 
cost of specialized driver software that doesn't port well 
between processor general ions. Stalling with ihe PA 
71001/ ". tile philosophy has been lo support the graphics 
requirements within Ihe existing architecture as much as 
possible. For example, the PARIS! architecture defines a 
sei of .'12-bit integer unit general registers and another set of 
154-hil floating-point unit general registers Loads and stores 
from rather set can be made to memory space, but only inte- 
ger register loads and stores were architecturally defined to 
I/O space. 

Stalling with the PA 7100LC, floating-point register loads 
and stores to I/O Space have been implemented. This has 
yielded improved performance because a single load or 
store can now move 64 bits and because more registers are 
available for operations that communicate with l/( ) space. 

In contrast With specialized operations, extensions within 
the architecture are generally applicable and carry forward 
inlo future generations These Optimizations can also be 
used to benefit workloads other than graphics. 

Graphics Optimizations. Several of the optimizations made in 
the PA 7300LG to further improve graphics performance 
include: 

• A large I/O store buffer 

• A relaxation of the load and store Ordering rules 

• The elimination of a (PI hang cycle previously needed 
for l/( ) stores 

• Improvements to ihe 1 1S< ' |/( » bus protocol 

The Structure of industry standard graphics libraries leads 
to bursty graphics I/O traffic. The bursts are of many differ 
ent sizes, but the inosl common burst is a w rite Of 26 words. 
The PA 7300LC ( PI core-io-l/< > interface implements a large 
write buffer and Can accept up lo 111 double-w ord writes 
Without Stalling the CPU pipeline. This allows the ( IT core 
tO burst up lo 111 double-word w rites lo the I/O subsystem, 
and then continue willi its next task while ihe l/( > interface 
is sending this dala out lo Ihe graphics hardware. 

Graphics Ordering. I'.VKISI is a strongly ordered architecture. 
Strongly ordered means thai all elements of the system must 

have a consistent v iew of system operations, in the case of 
graphics performance, this means ihat all buffered I/O stores 

must be observed by Ihe graphics device before the I «PI can 
access a subsequent piece of dala in main memory. Hence, 
an I/O store and a follow ing memory read are serialized. 
A loophole io the ordering requirement was created for 
graphics. IA > stores w iihin a programmable address range 
are allowed lo be out-of-order w ith respeel lo Ihe memory 
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arc-esses. The graphics software lakes responsibility I'm 
ordering when necessary. 

Hang Cycle. Previous PA-RISC processors always inclined a 
minimum of one hang cycle for an I/O store. Extra logic was 
added to the data cache controller on the CPU core to elimi- 
nate this hang cycle. 

Graphics Transfer Size. HP's high-speed GSt ' bus is used to 
connect graphics adapters to the PA 7300LC The CPU sends 
data to the graphics device with I/O stores. In the PA-RISC 
architect lire. I/O stores are (»4 bits or less. The (JSC is a 
32-bit multiplexed address and data litis. Stores of 64 bits 
turn into an address cycle followed by two data cycles. At 
best the payload can be only two thirds of the maximum bus 
bandwidth. As mentioned above, the average transfer size to 
graphics is 2ii words. Since these transfers are sequential, 
sending an address with every two words is unnecessary. 
Some form of address suppression or clustering of sequential 
writes was desired. Thus, the write-variahle transaction was 
created. 

Write-Variable Transactions \ new writ. '-variable transaction 
type was created for the <iSC bus. Write-variahle transactions 
consist of an address anil front one to eight data cycles. Since 
Hie PA 7300LC must be compatible with existing cards that do 
not implement the wrile-variable cycle type, the PA 7300LC 
only generates lliem in configurable address spaces. 

With this protocol, the I/O controller blindly issues wrile- 
variable transactions for enabled IA ) address regions. Stal l 
ing with the initial write, as each write is retired from the 
I/O write queue, the I/O controller performs a sequent iality 
check on the next transaction in the queue The process 
repeats for op to eight GSC data cycles. Maximum perfor- 
mance is achiev ed by allowing Hie I/O controller to begin 
issuing the write when the first piece of data becomes 

a\ ailable. 

The length of the transaction is limited to eight data cycles. 
Choosing eight data cycles is a good compromise between 
How control issues and amortizing address cycle overhead 
with payload. The wrile-variable enhancement increased 
maximum ( Tl f-to-graphiCS bandwidth from two thirds of 
the GSC raw bandwidth to S/9 of the raw bandwidth. The 
I'A 7300LC can easily saturate the CSC bus at 142 Mbytes 
per second compared with the 50 Mbytes per second 
achieved by the PA THIOL! ' with careful coding. 

MIOC Summary. Hie MH IC implemented a number of features 
that improve system performance while keeping costs low, 
including: 



• The second-level cache and main memory controllers are 
optimized to reduce the latency of copyin requests from the 
CPt'core. 

• The I/O controller improves graphics bandwidth and sup- 
ports efficient DMA accesses through the use of buffers and 
prefetching. 

• The MIOC is designed to be flexible, supporting a range of 
second-level cache sizes, a Variety of industry-standard 
memory components, two different memory widths, and an 
optional error correction scheme, 

Conclusion 

The PA 73001/ ' design builds on the success of past proces- 
sor designs and offers significant improvements in key areas. 
It features a superscalar f PI core, a large, efficient on-chip 
cache organization, lightly coupled second-level cache and 
main memory controllers, and bandwidth improvements for 
graphics. These features combined with frequency increases, 
extensive configurability, and high chip quality make the 
PA 7-'!0OL< ' attractive for a wide range of computer systems. 
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High-Performance Processor Design 
Guided by System Costs 



To minimize time to market and keep costs low. the PA 7300LC design was 
leveraged from a previous CPU. the chip area was reduced, cache RAM 
arrays with redundancy were added, and high-speed, high-coverage scan 
testing was added to reduce manufacturing costs. 

by David G. Knbicek, Thomas J. Sullivan, Amitabh Mehra. and John G. McBride 



While designing I he I 'A T-'ilXlLC processor, the CPU team hail 
In make design trade-offs between lime lo market, perfor- 
mance, anil maniifaclnring costs. < iccasionally these seem- 
ingly contradictory goals worked together to drive ihe learn 
to a decision. More often, however, the team had to make 
hard decisions, weighing the henefils of each of the design 
goals. 

This paper discusses the strategies used by the PA 7300LC 
physical design team lo Implement flie design goals for the 
PA 7300LC. 

Design Goals 

( )ne of the factors driving the design process was Ihe desire 
to liring the product to market as fast as possible, lb accom- 
plish this goal, we employed three major Strategies: 

Leverage as much as possible from the previous HP proces- 
sors] including hardware, software, and methodologies for 
design and test 

I lesign i|ualiiy into phase one. or ihe presilicon design 
siage, so that there would be fewer Iterations of the design 

during phase two, after Ihe first lape release 

Monitor project progress, avoiding any obstacles that Might 

seriously impact or threaten our schedule 

Keeping the COS! of the system as low as possible was an- 
other Important goal of tin- project Systems based on the 
PA 7300LC are meant to position III' in the low-io-midrange 
workstation market where prices an- set by competition, not 
system COSt, Therefore, savings in the system cost have a big 
Imparl on profit. To meet these aims, the team decided to: 
Integrate the first-level cache, a major system cost, into the 
processor, which hail never been done before man ill' 
trricropn icessor 

Integrate Ihe memory and l/( ) controller i NIK )(' ). creating a 

system on a chip 

Reduce chip area to low er cost 

\dd redundancy lo the SPAM arrays mi the chip, allowing 
some process defects to be repaired, thereby sa\ing chips 

that would otherwise be thrown out 

Provide high-coverage, high-speed scan testing K) lower the 
manufacturing cost of Ihe processor. 

Designs Leveraged to Minimize Time to Market 

To reduce the time to market for Ihe PA 7300LC, Ihe ( PI 
physical design team decided to leverage as many circuits 



as possible from the I'A 7I<K)1,C. Except for the process 
shrink from GMt HS26 to GM< >S1 1, much of the superscalar 
integer data path on the I'A T.'itlDl.l' was leveraged from Ihe 
PA 7100LC unchanged. Also, many of the cells used in the 
integer data path wen- used in other data path blocks on 
the chip. Although some of the circuits were rew orked for 
speed Improvements, the float ing-poinl unit was also highly 
leveraged from Ihe PA 7KM)1.C. Furthermore, the floating- 
point unit W8S used in the geometry accelerator chip for the 
Visualize 48XL graphics product. This lev erage strategy not 
only helped reduce time lo market, but also split the design 
costs associated with the circuit between Ihe ASIC and the 
CPU 

Control Blocks 

While all of Ihe control blocks leveraged from the PA 7100LG 
required some changes, much of the original control logic 
remained intact or was ai least similar lo Ihe original code. 
This provided the opportunity lo start the physical design 
early, providing the designers with the chance to work the 
bugs out of the tool flow, begin composition, and provide 
early feedback on difficult liming paths lo the control 
designers 

For physical circuit layout, the control phy sical team ini- 
tially used data scaled from the PA 7100LC m I lie ( .Ml >SU<i 
process lo the GMt )S1 1 process. In several cases, (he final 
ail work was almost entirely based upon Ihe floor plan 
scaled froin Ihe PA 7HI()I.( . In other cases, the control equa- 
tions were either vastly different (memory I/' > control) 0* 
entirely new (Ihe cache controllers), so we were unable to 
lake advantage of earlier wort 

In the case of the three main integer control blocks. Ihe 
liming information and a significant portion Of the control 
equations were usable. However, a Study of interconnect 
between the three blocks indicated that they could be com- 
bined into a single block lo simplify Ihe design from a timing 

Standpoint and to use global routing res ves efficiently. 

Bj moving several hundred signals aw ay from Ihe center of 
the die into a more locali/.ed area near the integer data path, 
we also sav ed significant area. 

Core Logic Library. While much of Ihe logical design Of the PA 
73Q0LC was leveraged fr ihe PA Tioni.t . most of the stan 

dard Cell libraries were borrowed from Ihe PA KIM It) project 
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Openings Cut in Metal 4 for New 
Contacts to Metal 2 and Metal 3 



Metal 3 and Metal 4 
Contact Arrays ^ 



Fig. i. Tin' photomicrograph 
shows s typical Kin (focused i< >a * 
beam} repaic For this Kilt repair, 
pun inns of i In' circuit uf ii ii i-i'i -si 
were loiiilfil miller a metal four 

power bus. Therefore, open&igs 
hail in In- ' Hi i tirough ihe power 
bus i n access the circuits below, 
Notice iiu« the platinum depos- 
ited by Hi' 1 FfB runs oyer top 
of the metal four, separated by 
passivation. 



The PA 8(lli() was fabricated using the same IC process tech* 
nology as Che i'A TilOOLC. but was farther along in Ihe design 
cycle. The PA 7:i()(i|.< ' team was able io use almost ihe entire 
PA 8000 core logic library unchanged Unfortunately, a dif- 
ferent clocking strategy meant that the driver library needed 
significant rework. 

Standard Cell-Based Control Block Design. The use of a stan- 
dard cell-based design for Ihe Control blocks, which was 
leveraged from Ihe PA 7I00LC, allowed great Flexibility 
when Coring functional hugs, both in phase one (presilicon ) 
and in phase Iwo (postsilicou). During phase one, the stan- 
dard cell approach permitted fairly <|iiick turnarounds of a 
control block for rather complex changes. ( ifien all that was 
required of Ihe physical designer was lo rerun the synthesis 
and routing tools, apply a few hand changes, and verify the 
design. 

Use of Spare Gates. During the wi.v late stages of phase one 
and all of phase (wo. Ihe use of spare gales in Ihe standard 
cell blocks allowed ihe physical designers to make logical 
changes by changing only the metal layers. One very late 
bug tlx was made between the time the lower-level masks 
(e.g., diffusion, well, polysilicon) and ihe higher-level metal 
masks were released lo the mask shop. Additionally, when 
phase two bugs were found, we were able to use Ihe spare 
gates for meial-only changes. Because a number of wafer's 
were held in the fabrication shop before Ml (the lowest 
level of metal ) was placed, metal-only changes were nut 
through the fabrication shop very quickly since the lower 
layers were already processed. 

FIB Process. Another advantage of the metal-only changes 
was exploited during phase two. As control bugs were un- 
covered, we were able to rewire I lie logic using spare gates 
and Ihe FIB (focused ion beam) process. The MB process 
uses an ion beam lo cut and expose various metal lines on 
a functional chip and to deposit platinum, reconnecting ihe 
gates into a new logic Structure. A typical FIB repair is illus- 
trated in Fig. I. Dse of the FIB process allowed Ihe design 



team lo verify bug fixes in a system thai often ran at full 
operating speed. This resulted in a more complete functional 
verification, since tests run much faster in real silicon than 
in simulation. 

New Tools. While the synthesis tool (Synopsys) and routing 
tool (CellS from Cadence Systems) were the same as on Ihe 
PA 7I0OLC project, newer versions of these tools wilh addi- 
tional features and problems were employed. The ability lo 
work wilh the tools at an early stage allowed Ihe physical 
control design team the chance to learn ihe Strengths and 
weaknesses of the tools, so that they could be exploited or 
compensated for once lull functionality was reached in the 
control equations. Kven though new versions of these tools 
presented a few new problems, ihe basic method of opera- 
tion was the same as for the PA 7100I.C. Thus, use of these 
lools helped reduce our time to markel by leveraging our 
previous experience wilh them. 

Phase One Quality Equals Reduced Phase Two Debug 
Time 

In addition lo leveraging designs and methodologies, correel 
balance between ihe lime and resources spent ensuring 
phase one ipiality and Ihe lime and resources spent finding 
fimciional problems in phase Iwo can also reduce time lo 
market. In this project, we gave great weight lo ensuring 
phase one quality, since this would make debugging much 
easier in phase two. In return for our investment, the 
PA 7S00LC had one of the shortest and smoothest phase 
two periods of all (Pi's designed by HP. 

Debugging Trade-offs 

In phase one of Ihe design cycle, simulation, emulation, and 
hand analysis are the key lools of Ihe designer. Will) ihese 
tools, the designer can examine every detail of ihe design ai 
any chip stale and under any conditions. 

In phase Iwo. tests can be run much faster on a real chip 
than in Simulation, accelerating bug detection. However, 
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root-cause analysis of problems is a slow and difficult pro- 
cess because virtually all signals are hidden from the scru- 
tiny pf the designer. In addition, the signals that are avail- 
able for the designer to examine are either chip IAJs. or are 
only indirectly available through scan paths. Hence, electrical 
phenomena such as glitches and power supply droops are 
not easily observed.Therefore. phase one debugging is a 
modi Simpler prOC 0 9S because Of the availability of detailed 
data about the internals of the chip. 

Cross-Checking Designs to Improve Phase One Quality 

To ensure phase one quality, each design on the PA 730GLC 

was subjected to a series of computerized automatic design 

checks, and then a set of manual checks was performed by 

designers who were not involved with the original design. 

These checks looked at: 

Circuit topologies 

FETsizc and connections 

Wire size 

Power routing 

( lock signal routing 

Signal coupling 

Signal types and timing. 

Automated and Human Checks 

For automated tests, the computer applied the same rules to 
each node on I he chip quickly, without the bias that a human 
may have had. However, the computerized checks often 
generated spurious error messages and required significant 
human intervention to identify the real problems. AteO, many 
rules were lieyond the scope of Computer algorithms and 
required human checks. 

An example of a simple computerized check used on the 
PA 7300LC is a signal edge rale check. Every signal on the 
chip was checked against a set of criteria that depended on 

the signal context The computer program blindly reported 

any signal that violated the specification set for that type of 
signal. Il was the .job of the designers to determine which 
errors flagged by lite computerized check were real prob- 
lems. The designers then fixed real errors and waived all 
others, < Ihviously. with I his and all oilier automatic checks, 
a certain amount of skill and experience is needed tojudge 

what constitutes a potential problem and what does not 

Because some quality checks do rlOl lend themselves easily 
to compulcri/.cd checking, each cell, subblock. and major 
block of the CPU had to be examined by an experienced 
engineer who was not the designer of the block. The cross- 
checking engineer had a lisl of guidelines tO follow for 
checking each design, and any v ariance from these guide- 
lines was discussed with Ihe designer. This checklist was 
broken into categories so thai the cross -checking engineer 
COuJd fOCUS on one particular area at a lime, such as sche- 
matics, artwork, test, and so on. 

An example of a check that is not automated is an artwork 

Check which ensures that all circuits have Very solid power 
and ground networks. The subjective nature of this check 
makes it very difficult Co implement with a computer check 
Also, because of the subjective native, the checking engineer 
musl bo very diligent about whal constitutes a solid supply 
net and what does not 



Circuit Timing 

When operating with a clock period of only a few nanosec- 
onds, liming is of utmost importance as a phase one quality 
issue. Sever a l different tools were used to this end. most 
notably Cadence's Veritime. EPIC's Pathmill. and HP SPICE 
(see Fig. 2). 

Chip Model Tools. A Veritime model was generated and main- 
tained for the top level of the chip. This model included 
either gate-level descriptions of blocks (generally for Ihe 
standard cell blocks) or black box descriptions of blocks 
( for Ihe custom data path blocks), as well as models for the 
delay due to the interconnects between blocks. < in a regular 
basis, the timing team updated the model and performed 
timing analysis. The restdts were then given to the various 
block owners, who redesigned slow portions of critical 
timing paths. 

HP SPICE and EPIC's Pathmill were used by a numher of 
the eUStOffl data path designers to generate black box mod- 
els of their blocks for the global Veritime model. Also, some 
designers analyzed larger standard cell blocks with Veritime. 
Additionally, a tool was developed that estimated Ihe delays 
of all signal routes, which could Ihen be hand-checked for 
anomalies. 

Finally. IIP SPICE was used extensively to simulate the 
timing of all major buses, many top-level routes, and other 
timing-critical paths. All elements of the standard ceil 
libraries were also characterized with IIP SPICE, using 
conservative parameters. While this approach caused a few 
more phase one headaches for the control designers, we 
uncovered no l iming issues for Standard cell blocks during 
phase two characterization. 

Chip Composition Focused on Minimizing Cost 

One of Ihe w ays thai we were able to drive down Ihe cost of 
systems thai incorporate the PA 7300LC was to reduce (he 
die size, thereby allow ing more die per wafer in fabrication 
and improving yield. This was a key focus of Ihe physical 
design group, and resources were dedicated to monitoring 
the impact of all changes on the manufacturing cost of the 

chip. We took several steps during phase one to ensure that 
the PA 7300LC was as small as we could reasonably expect. 

Global Floor Planning We Started global floor planning and 
routing early in the design phase. I iiu initial floor plans, 
although they bear little resemblance to the final chip floor 
plan, provided the groundwork for early estimates on die 
size and feasibility. < the of the early decisions was Whether 
we would use three layers of metal, as on Ihe PA 7100U '. or 
add a fourth metal layer. After extensive analysis, we con- 
cluded that, with only three metal layers, our stepper size 
would limit us to a very small first -level cache, which would 
not meet our performance targets. So, we arlded the fourth 
metal layer. As it turned out, the fourth metal layer was 
essential to the success of the project for many other 
reasons, even though the decision w as made over a year 
before lape release. 

As the design matured, the floor plan and routes kepi up 

wilh the changes and provided feedback on die size and 
potential timing problems. Fig. :1 shows the final floor plan. 
We made several major compositional changes early solely 
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to remove congestion on the top metal layer and m compact 
the die area. These compositional charges included move- 
ment of a block in the data cache and changing the asped 
ratio of our pad-ring bitslice. 
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Using the "Dirty Trick." One of the opportunities we saw was 
in the composition of the data cache* Originally the data 
cache was designed to lit' completely symmetrical, wiili a 
right side, ;i left side, and a dala path in the middle lo merge 
the dala from the two sides. The design essentially had three 
blocks on each side Stacked from bottom top: the dala 
RAM array, the tag RAM array, arid the ditty bloc k array. 

As we stalled routing the chip in its early phases, we saw 
that we had much more routing congestion in the channels 
above I he right side of I he data cache than above Ihe left 
side. Tin' channels above the right side led to the integer and 
floating-point units, while those above the left ran towards 
the memory and UO controller (MIOC) (see Fig. 4a). The 
congestion on Ihe right side of Ihe dala cache would have 
increased Ihe height, while leaving unused area above Ihe 
left side. 

To deal willi litis congestion problem, we employed what we 
called the "dirty trick." The dirty bit block in the data cache 
is used to store one bit of information for each line in the 
cache. This bit lells ihe processor whelher ihe infornialion 
contained in thai cache line has been modified by the C'Pl" 
and is therefore dirty. In our original conception, each side 
of the cache had its own dirty block, which consisted of one 
bit of information per cache line and an address-to-cache- 
line decoder, the latter being ten times larger. 

By putting both dirty dala bits on the left side of the dala 
cache and sharing one address decoder. Ihe left side of the 
dala cache grew by one bit of information, bill Ihe right side 
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shrank by one bit of information and one address decoder 
(sec Fig. -lb). This was a big win as il allowed the die size to 
be shrunk by the height of the dirty block. If we had not 
floor planned and routed early in the design phase, we 
would not have seen this opportunity in time to act on it 
and reduce the die size. 

Outer Ring of Pads Limited. We were not in the clear yet, how- 
ever. After all of this work on driving down the die size, we 
entered an interesting situation. Even though we were able 
to shrink lh«' core dimensions, we were now limited by the 
outer ring of pads that connect the die to its package. This 
was not an issue earlier, since we hail fewer pads and a 
laruer core, but as the project progressed we added pads 
and shrank the core until we reached this predicament. 
However, the I/O ring team elongated our bit slice in the ring 
slightly and narrowed the width considerably, allowing us to 
reduce our die size until we were once again limited by the 
size nl the core. Again, this trade-off on the bit slice dimen- 
sions was not readily apparent at the outset of the project, 
but was obviously a big win when we analyzed the situation. 



Metal-Four Routing. The last obstacle came alter we finished 
automated signal routing. The router we used. HARP, was 
designed for the three-metal-layer process used on the 
PA 7100LC and so it was unable to automate the fourth metal 
layer. It was a channel-based router, which allowed the block 
designers to use all three metal layers within their block 
boundaries, but required us to leave areas free between the 
top-level blocks for the interconnect. We used HARP to con- 
nect signals between lop-level blocks, but we left major 
buses, power connections, clocks, and speed-critical signals 
for the hand-routed fourth metal layer. This meant, however, 
that any layer-four metal used within the blocks could inter- 
fere with the global metal four, which we planned to run 
over the blocks on the chip, not merely in the channels. 
Therefore, from the outset of the project, metal four was 
under the ownership and control of the composition team. 

1 HARP (Hewlett-Packard Automatic Routing Program! is an internal routing tool thai was 
leveraged trom Ihe PA 7 IODIC loulset 
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The cache array designers were given lull control of layer- 
Four metal in their areas, but all oilier block designers pro- 
ceeded as ii" they only had three metal layers. As the global 
metal-four floor plan matured, metal four was released to the 
block owners to reduce area in places that did not conflict 
with the global route. In all other cases, the block owners 
were constrained to use the lower three metal layers instead 
of placing obstructions to the global metal-four route, even 
if it meant growing their blocks. 

This stingy allocation of metal four became very important 
as new buses and timing-critical signals were promoted up 
to the metal-four "superhighway." Near the end of the proj- 
ect, the connection of the metal-four power buses to the 
lop-level blocks became more and more challenging, and 
would have been impossible if not for the freedom retained 
by keeping metal four clear of obstacles. 

Leaving the flexibility to make lasi minute changes was criti- 
cal to meeting our die size commitment. Since, at that point 
in i he project, our packages had been ordered with a speci- 
fied die cavity, changing would have had serious financial 
and schedule implications. 

Practice Runs 

The PA 7300LC team used several techniques to ensure that 
the project would proceed as smoothly as possible. These 
techniques included building an SRAM test chip and doing a 
mock tape release. 

Using a Prototype Chip. Before the PA 7300LC, HP had never 
produced a ( IT with any significant amount of on-chip 
memory. How could we ensure that the cache would work 
in fust silicon? Without a working cache, running lest code 
in an actual system would not be practical. To help ensure 
a working cache in first silicon, we designed and built an 
experimental memory chip, featuring various RAM cell 



topologies. I bis test chip provided a large amount of visibil- 
ity into the workings of the RAM cells. It also proved to be 
an excellent tool for analyzing the workings of the on-chip 
cache. Because the RAM design was effectively "phase two 
verified" during phase one of the < 'PI ' design cycle, the 
PA 78DQLC on-chip cache worked in first silicon, greatly 
easing the lime and resources required for phase two debug 
ging or the rest of the CPU. 

Mock Tape Release. Tape release of a ( 'PI ' is quite a compli- 
cated process, involving several steps of database copying, 
verification, translation, and data transmission. Also, it does 
not lend itself to leveraging. Any one of these steps could 
cause a delay of several days in fabricating the chip. There- 
fore, we performed a mock tape release in which each step 
was executed as if it were part of an actual tape release. The 
only exception was that the data used was not the final, fully 
designed CPU. When the lime came to do the actual tape 
release, the process went very quickly anil smoothly. 

SRAM Redundancy Improves Yield 

With about eight million of the nine million transistors on 
the PA 7300LG, the cache is the most likely block on the 
chip to fall victim to a fabrication defect. Therefore, we 
added redundant blocks of four columns each in the SRAM, 
so that a block that contains a fabrication defect can be re- 
placed with a functional block via a set of multiplexers. 
Fig. •"> illustrates the shifted block method used to replace 
the defective block with a redundant block. 

The select logic on the multiplexers shown in Fig. 5 is con- 
trolled by a fuse that can be blown with a laser to deselec t 
the failing block of columns and select an adjacent block 
The adjacent block's multiplexer must also be programmed 
to select the next block. This ripple continues until one of 
the redundant blocks has been substituted into the RAM 
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Fig. 6. Two of the serial number fuses in the photomicrograph above 
have been blown by the laser. The "Out two are left Iniaei 



array. By substituting an adjacent block rattier than intmedi- 
alcly siibsiiluling I ho redundant block fur the defective 
block, f i 1 1 1 ■ i ■ t* changes arc minimized. 

Adding a Serial Number 

• )ne ofthe new features incorporated on the PA 7300LC is a 
serial number individually programmed onto each die by the 

same laser that programs the redundancy selection multi- 
plexers for the on-chip cache SHAM. As wafers arc put on 
the laser for cache redundancy programming, we are able to 
blow the serial number fuses al the same time. The serial 

number feature was added to the production Bow with very 

til lie* overhead. Fig. 0 shows a sel of serial number fuses. 
The serial number was added lo die l'A 7;)00L( " because: 
It provides I he ability lo track any given die back lo its origi- 
nal lot, wafer, or die designation, so we can analyze informa- 
tion gathered on the die at wafer lesi and at inilial package 
test i in previous microprocessors, we were unable lo Irack 
backwards in Ihis fashion. 

Il allows the design loam to selocl specific dice off a wafer 
without having to remove the whole wafer from the produc- 
tion How. This makes it much easier lo grab interesting pai ls 
for further characterization. 

h provides a convenienl way to refer to and classify produc- 
tion parts. The serial numbers became an invaluable pari of 
I he phase two debug effort, because we were able lo knuw 
the history of the part we were debugging. 

High-Speed. High-Coverage Testing Reduced 
Manufacturing Cost 

Both broadside parallel and serial scan tests were used to lesi 

the PA 7100LC. Many of these tests were leveraged for use 

with the PA 730014 ' Sonic lesls were simply copied from 



the PA 7100LC test suite and reformatted for use with the 
PA 73U0LC. These tests included legal PA RISC assembly 
code for parallel vectors and serial scan tests of highly lever- 
aged blocks, such as the integer data path. 

( )ther tests required small changes. For instance. TLB tests 
on the PA 7100LC involved writing and then reading a variety 
of values for each TLB entry Then die test simply looped 
through litis process for each of the 64 entries in the TLB. 
Thus, to test the PA 7300LC's 96-eniry TLB. we merely 
changed ihe loop value from 64 to 96 entries and refor- 
matted the lesi. 

Automated Test Generation. While mam ol the highly leveraged 
custom data path bloc ks could use sc an tests leveraged from 
the PA 7100LC, this w as not the case for Ihe logic -synthesized 
standard cell blocks because any logic- change rendered the 
old tests useless. Fortunalely Ihe use of an automated test 
generation tool allowed the PA T-'IOOI.C team to have asignifi- 
cant portion of the serial tests written before we rec eived 
first silicon. Shortly thereafter, we completed the rest of the 
serial tests, with high fault coverage. The control block lesi 
efforts were also helped by widespread use of slate memory 
latches which were controllable and observable via serial 
scan testing. 

Manual Test Generation. For custom ilala path blocks that 
were not leveraged, such as those in Ihe MIOC and cache 
controller blocks, block designers wrote tests by hand, en- 
suring thai each transistor in their design would be tested. 
Often. Ihis daunting lask was aided through the use of Perl" 
scripts lo help generate the test vectors. Tims, many circuit 
designers found themselves becoming part-time soli ware 
developers until I heir block lesls were written. 

Verifying Block Tests. As block designers be gan generating 
serial lesls. the ability to verify these lesls became an issue. 
Simulating a single block tesi on a model of Ihe chip would 
lake anywhere from a few minutes lo several hours. How- 
ever, a real chip could run even the largest tesi in.jusl a few 
seconds. Therefore, a way to verify the block tests on an 
real chip could save a lot of simulation lime wilhoul com- 
promising icsi quality. 

However, the testers used to test these chips in rnanufactur- 

ing were no! readily available. Furthermore, they were loo 
expensive lo use for Ihis purpose. Since we were running 
serial block lesls. we only needed to control the chip's serial 
test porl pins. The Other " hip I/O pins could be lied lo 
ground. 

Fortunately, we had decided lo make our serial lesi port 
comply With theJTAG (IEEE 1149.1) industry standard. This 
meant thai a relatively inexpensive tesi porl Interface was 
readily available. We purchased a JTACi Industries PM 3720 
and buill wlial we called a "bench lester" around one of 
these interfaces. Fig. 7 shows a block diagram of Ihe bench 
tester. 

We fed the chip power and controlled Ihe resel pins with a 
couple ol did IIP 0002A dc power supplies. The syslem clock 
was provided by an HP Sl.'HA pulse generator. Finally, all of 
these components were controlled via ihe HP-IB and con- 
nected lo the lab's computer network through an HP !)00() 

1 Peil [Practical EXtnctfOfl Hepoil I unyuago) 15 itellWl lo handte o vaiielv ol UNIX "' system 
admimxlrative fundfonj 
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Model 746 industrial workstation. The network connection 
allowed designers to run tests and monitor results from 
their desks, or even from home, 

Many block designers pushed I he bench testers well beyond 
Iheir original intended use. By the end of lite project, they 
COUbl be used to create voltage-versus-frequency shnioo 
plots BS the tests were executed over a range of power supply 
and clock frequency values. We even engineered a way to 
execute a loop of code in the instruction cache with no 
other system support logic, proving that the PA 7300U ' is 
truly a. system on a chip. 



Summary 

In conclusion, the PA 7300LC design team owes much of its 
success lo piettiOUS project teams. Our aggressive linte-lo- 
market goals were met not only because of circuit leverage, 
but also because of methodologies from previous projects. 
Also, an early focus on quality prevented a lot of rework at 
the end of the project. Excellent performance from litis 
highly integrated processor gives HI' a competitive advan- 
tage in the cost -sensitive, performance-hungry market for 
which it was designed. 

UNIX is a registered trademark in the United States and other uounlrios, licensed exclusively 
through X/Upen L-'umpany binned 

X/Open is a registered tiademark and the X device is a trademark ul X/Open Company Limited 
in the UK and other countries 
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Verifying the Correctness of the 
PA 7300LC Processor 



Functional verification was divided into presilicon and postsilicon phases. 
Software models were used in the presilicon phase, and fabricated chips 
and real systems were used in the postsilicon phase. In both phases the 
goals were the same — to find design bugs and ensure that customers get 
the highest quality part possible 

bj Duncan Weir and Paul G. Tobin 



Ensuring the correctness of the complex PA 73O0LC de-sign 
required an extensive verification effort. We wanted Id en- 
sure that no customer would ever encounter a design hug. 
To reach this goal, we set out to exercise the design more 
extensively than is done w ith user software. Previous IIP 
processors have maintained a well-earned reputation for 
<|uahty, and we wanted the PA 7:MK)LC to meet or exceed 
the quality of its predecessors. 

This paper discusses the methodology used to verify the 
correctness of the PA 7300LC and the diagnostic hardware 
incorporated into the design to support debugging. 

Functional Verification 

The functional verification effort was divided into presilicon 
and postsilicon phases. The presilicon phase involved deal- 
ing a software model of the chip and an environment in 
which the tnode) could be thoroughly tested and debugged. 
Thi' modeling environment provided many features to aid 
verification including the abilil.v to initialize the machine 
stale, inject stimuli, and see into all portions of the design 
for debugging. I >ne major drawback of the modeling envi- 
ronment was the slow simulation speed. 

Complementing the presilicon effort, an extensive post- 
silicon verification program was completed that look advan- 
tage Of the lest throughput available when running on an 
actual computer. 

Extensive testing of the physical circuit design of the 
PA 7:l(K)i,(' was done in presilicon and postsilicon environ- 
ments tO ensure that the circuits would meet frequency, 
Voltage, and temperature targets. This topic is covered in 
the article on page 61, 

Presilicon Verification 

For belter efficiency. We chose to divide the design of the PA 

7300LI ' into two components: the CPU core and the memory 

and l/( i coiil roller I Ml" )< ' I These two portions of the design 
were logically separated by a well-documented Interface thai 
enabled us to verify each component independently. Verifying 
the two components independently prov ided several benefits: 
Smaller and foster models 

Precise control over the stimuli at the t pi -MK K ' Interface 



• Simpler model managemenl (because less coordination was 
needed ) 

• Reduced debugging lime (since it was know n which portion 
of tin- design contained the bug). 

AS die design neared completion and both the ( PI " and Ml< )(' 
had been extensively verified, we created a single merged 
model that included both components. This provided a thor- 
ough check of the interface between the components and 
was a double check of the independent verification work. 
In addition, the MltK' was incorporated into a model with 
external I/O devices lo ensure that the PA 7300LC design 
would work with the components heeded lor a complete 
Computer system. 

The presilicon verifiealion environment consists of three 
pails: modeling environment (model), lest case environment 

(stimuli i. and checking environment (checks). 

Modeling Km ironiiieni 

We modeled the PA 7300LC design using the Verilog hard- 
ware description language. The design was primarily modeled 
al the logic gale level with connectivity extracted from the 
physical design. Some key portions of the design like I he 
caches, TLBs, and floating-point execution units were 
modeled al a higher level lo improve the si/.e and speed of 
the model. 

Pig. 1 shows the CPI' and MIOC modeling environments. 
Software emulators were connected to the model Interfaces 
lo provide input and respond to output from the model. The 
programmable nature of the emulators allowed test cases to 
exercise the interfaces fully. 

New Modeling Process. Managing the modeling environment 
of a large design is a time-consuming task requiring coor- 
dination among all team members. Problems with a model 
build could lead to downtime thai would stall the verifiea- 
lion effort. To minimize these problems, a new model build 
ing process was Implemented for the PA 7300LC design. All 
blocks of the modeling environment were placed under revi- 
sion control. Any changes had lo be included in a process 
change order thai documented the purpose of the change, 
the blocks affected, the dependencies existing between this 
and oilier process change orders, and the testing needed to 
verify the change. In addition, an automated model build 
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Fig. I. Presilieon verification modeling environments, (n) CPI ' 
madding environment, (b) Memory and I/O controller (Mil i< ') 
modeling environment 

procedure was put in place In allow designers k> inlegrate 
I heir changes into a private copy of I he model and verily 
them in isolation before submitting a process change order. 
Finally, before a model was released to the verification team, 
il would undergo regression lesling to eliminate blatant 
errors. I "sing the new system resulted in a consistently si able 
model that accelerated the verification effort. 

Test Case Environment 

Test cases control the stimuli applied to a model, thereby 
providing the event interactions I hat stress the design. Having 
an efficient way for test cases to stress the entire design is 
an important factor for improving quality. The strategy used 
for the PA 7:10(ILC was largely leveraged from the successful 
PA 7100LC effort. 1 It provided a simple way lo initialize 
machine-stale resources like registers, caches. TLBs, and 
memory. Il also allowed high-level coordination of instruc- 
tions executed by the CPU along wilh transactions occurring 
at I he model interfaces. 

lest cases for the PA 7300LC came from three sources: 
cases leveraged from the PA 710ULC, new cases focused on 
the PA T-'iOOLC. and randomly general ed cases. 

Thousands of cases that were written to cover the PA 7100LC 
design were leveraged to run tai the PA 7300LC. Most cases 
needed no modifications to be effective because of similari- 
ties in the designs of the two chips. For the portions of the 
PA 73Q0LC design that were different, new cases were pro- 
duced. Some of these cases were written to focus on partic- 
ular aspects of the design such as instruction-cache misses, 
the CPI '-MIOC interface, and the second-level cache. ( Hher 
cases were produced using random code generators that 
were designed In stress l he PA 7-'100LC. 

Random code generators are mainly employed for postsili- 
con verification, but the PA 7300LC leani also emphasized 
their use for presilieon testing. Although challenges were 
encountered, the results were positive. Many subtle bugs 
that might not have been found until postsilicon testing 
were discovered early in the design process. Random code 
generator's also provided an efficient way of achieving broad 



Coverage with fewer engineers than other testing methods. 
See "Random Code Generation" on page 71 for more on litis 
topic. 

Checking Environment 

A modeling environment and interesting stimuli are only two 
pieces of the verification puzzle. The other critical piece is 
verifying the model's response lo Stimulation. On a complex 
design like the PA 7:)00LC, with many designers and lens of 
thousands of test cases, it would have been impossible to 
verify correct model behavior Without aids lo automate the 
process. As a result, a significant part of t he PA 73Q0LC veri- 
fication effort was spent creating software moduli's I hat auto- 
matically verified the model's response to events created by- 
test cases. 

Modules were compiled into the model to check that the 
Mil )( " followed the proper l/< I bus protocol and to ensure 
Chat both the CPU and the MIOC followed the protocol at 
the CPl-MIOC interface. Checkers were also written to en- 
sure that the memory controller obeyed proper timing proto- 
col on the main memory and second-level cache buses. 

CPU Testing. For the CPU core, we linked a PA-RIS( archi- 
tectural simulator lo run synchronously wilh tin- model to 
ensure thai instructions were executed as the architecture 
requires. When an instruction finished executing, I he results 
were compared between I he model and the simulator. A 
special module called a tlvpiiicr was written lo translate 
internal ('IT signals into architectural events that could 
be checked by the simulator. After a test case finished, the 
model's final machine siale was compared against the simu- 
lator's final machine state. 

New Transaction Checker. Logically, the MIOC converts in- 
bound transactions on one interface to outbound transac- 
tions on a different interface. For example, the CPU core 
might Initiate a cache line cop.vin that (he MIOC converts lo 
a read on the memory port, When the memory supplies the 
data, the MH )C returns the cache line to the CPI", A special 
transaction checker, called the iiiclaciicrki'i; was written lo 
verify that proper transaction conversions occurred. The 
metachecker matched inbound transactions with their asso- 
ciated outbound transactions. Mismatched transactions were 
reported as failures. 

New Cache Checker. The cache controllers for the PA 7300LC 
are among the most complex portions of the design. As a 
result, a checker was written to verify their operation. Il 
monitored the instruction pipeline, the cache read and Write 
ports, and the CPU-MI' >C interface. Any incorrect behavior 
was detected and reported. 

Ad Hoc Checks. Finally, a collection of small, ad hoc checks 
were included in our presilieon testing to cover things that 
might otherwise he missed. Some were signal-level checks 
(Tor example, checking that a set of signals were mutually 
exclusive), others were special checks required by test 
cases. Some checked that performance features such as the 
superscalar pipeline were operating correctly. 

Together, the checkers formed a seamless net lo ensure that 
Incorrect model behavior would be delected. There was 
some overlap between I he checkers. Many limes a design 
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flaw would get Bagged by several of the checkers, but divid- 
ing the work bctWCCT multiple checkers was an effective 
way to reduce the risk of a design flaw escaping detection, 
wliile allowing verification engineers to work in parallel. 

Model Matches Physical Oesign I Mice the physical design and 
the verification effort stabilized, we verified that the Verilog 
model matched the physical design. Tins was done by deriv 
ing a switch-level model from the actual chip artwork and 
running thousands of tests on both it and the Verilog model, 
comparing key signals on even clock phase of simulation 

Postsilicon Verification 

Once the design is fabricated, the nature Of the ratification 
effort changes completely. The goals are still the same — 
to find the design bugs and ensure that customers gel the 
highest -quality pan possible, but the tools anil the approach 
are different . 

Test Systems. The environment for testing the design shifted 
from software models to real computer systems that in- 
cluded PA 7300LC chips. We set up a number of test sys- 
tems, each of which could be controlled remotely from a 
liosi workstation using remote debugger software. The re- 
mote debugger provided us with the ability lo load and i nn 
programs on the test system and to examine portions of the 
machine state. It also gave us complete control over the 
machine without any operating system layers obstructing 
our access lo syslem resources. 

Because Ihe PA 7'300l,( ' is designed lo work in a number of 
different syslem Configurations, we set Up systems thai had 
different dock frequencies, cache configurations, and mem- 
ory liming. To ensure thai the design would work with a 

variety of different l/< • cards, exercisers tor the QSG 1/' > bus 
were created thai could change their behavior to mimic any 
type of I/O card. 

Random Code Generation Random code generators arc an 
efficient way to lake advantage of Ihe speed of postsilicon 
testing, With a small amount of human control, lln-.se pro- 
grams can Create millions of unique tests to exercise every 
ITUance pfa COtnptex design. We used random code genera- 
lion extensively on Ihe PA 7300XC by employing six differ- 
ent generators. One targeted the Floating-point design, one 
was directed al the MR « and four covered the entire chip 
operation* 

Extensive Suite olTests. We supplemented the random testing 
with an extensive suite of tests using I/O exercisers to stress 
the Mb K' design. Many lesis were leveraged from post- 
silicon testing of ihe PA 7100LC and were modified for the 
PA 7300L<G. Additional tests were written lo provide better 
coverage, especially for areas where the PA 7300LG design 
differed from the PA 7100LC 

Sell-Checking Tests. The elaborate checking methodologj 
from presilicon verification was of no use in postsilicon test- 
ing because il was not possible for the checking software to 
Observe the design now embedded on a VLSI chip running in 
a system. To compensate, all of the poslsilicon tests were 

self-checking. The generators thai created the random teats 
also ensured thai the chip responded properly to them. 



Random Code Generation 

The complexity of processor designs has increased dramatically in an 
effort to improve performance, reduce system cost, and allow processors 
to be used in more system configurations The increasing complexity 
makes it almost impossible to identify the specific event cross-products 
that need to be tested to ensure that a design is correct. Random code 
generation is an effective method lor testing a design without having to 
identify exactly what needs to be tested A random code generator 
creates legal, random sequences of machine states and instructions that 
exercise a design more thoroughly than application software. 

The term random is somewhat misleading — generating completely ran- 
dom machine states and instructions would result in uninteresting tests 
as far as stressing the design is concerned Instead, generators locus on 
key aspects of the design while preserving an element of randomness 
Accelerating rate events, hitting boundary conditions, and concentrating 
on instructions that exercise complex parts of the design are among the 
ways to focus a geneiatoi The probabilistic distribution of random num- 
bers creates interesting combinations of these focused events 

Although random code generation has higher coverage in postsilicon 
testing where the design can be tested at high speeds, it can also be 
effective in presilicon testing When running on relatively slow presilicon 
models, the effectiveness can be improved by adding more elaborate 
checking strategies and focusing the generators on smaller portions of 
the design 

Some elements of a quality random code generator Include 

• Coverage of the entire design 

• Focus on complex portions ol the design 

• Low fault latency (i.e.. a failure gets noticed soon after it occurs) 

• Reproducible test cases 

• Aids for debugging failing tests. 

Random testing techniques can also be applied to designs other than 
microprocessors Memory or I/O controllers can use these techniques to 
randomly generate machine slate and transactions that will stress Ihe 
controllers Designing special-purpose bus exercisers that are controlled 
by random test generators can extend such testing into the poslsilicon 
environment. 



System Test. A final element of the poslsilicon testing was 
verifying that operating systems and application programs 
ran properly on Computer systems built around the 
PA 73001.1'. A large amount of testing was done by several 
different organizations within IIP and included operating 
system reliability tests, benchmark programs, and key usei 
applications. 

Verification Results 

The PA 7300LC verification work was a success. Presilicon 
testing eliminated over 800 design bugs, and more than 
1200 process change orders were added lo Ihe model in one 
seat The quality of the first rev ision of the chip was very 
high. I Inly eight Functional bugs were found in poslsilicon 
testing. ( If these, only one affected our design partners, and 
il had a simple workaround. The HP I X operating system 
was booted shortly alter first revision parts arrived. Our 



© Copr. 1949-1998 Hewlett-Packard Co. 



June N97 Hewlett Packard Journal 71 



postsilicon testing was far more extensive llian wlial we had 
previously done with the PA 7100LC or its predecessors. 
The verification effort ensured that the PA 7300LC will main- 
tain HP's reputation for quality processors. 

Debug Support 

Tlie high level of integration on the PA 7300LC reduces the 
\ isihiliiy into chi|) operation that aids in debugging proto- 
type silicon. In partic ular, moving the primary caches onto 
the chip removed a valuable source of deling data while also 
introducing a new source of potential functional and elect ri- 
eal problems. 

Since the MM K '. floating-point coprocessor, and TLB are also 
contained on the same die. the only external pads visible to 
debuggers were for the I/O bus and the memory interface. At 
the same time, the PA 7300LC had new challenges such as a 
large primary on-chip cache, a new [G process, higher oper- 
ating frequencies, and a second-level cache. Deling support 
was Important to improve the signal visibility and to reduce 
the risks associated with the new technology. 

Debug Mechanisms 

Signal v isibility is of primary importance when debugging 
a failure, so several techniques were used to make internal 
signals accessible. 

Idle cycles on the (iSC I/O bus were used to drive debug 
information. 

Seventeen special chip pads are dedicated to driving real- 
time debug information. To reduce cost, these pads are not 
bonded in production parts. 

Thorough implementation of IEEE 1149.1 and saniple-on- 
the-fly (a scan technique invented for the PA TlOOLt ')'•- 
allowed a very broad, but only one-cycle-deep, snapshot of 
the chip state to be reported. Custom data capture hardware 
was designed to gather the debug traces and present them 
to a logic analyzer. 

New Pattern Mapping Failure Isolation Technique 

Traces captured from the debug polls can be overwhelming 
in size, making it difficult to isolate the failure. The PA 
7300LC addressed this problem by implementing circuits to 
recognize internal chip state patterns. The patterns are pro- 
grammed from software using special instruct ions imple- 
mented on the PA 7300LC, and the capture of debug traces 
can be predicated on a state pattern match. Debug traces are 
thus shortened to an interesting region. It is also possible to 
alter the program flow upon a pattern match, allowing a 
branch to diagnostic software to probe for a failure. By 
providing a flexible scheme for programming repeatable 
patterns, the task of isolating a failure and performing ex- 
periments to determine its root cause was greatly simplified. 

Target Applications For Debug 

Functional and electrical verification were the primary 
applications for which the debug circuitry was designed, but 
the debug features were general enough that they could be 
used to diagnose processor problems encountered during 



bringing up the operating system, firmware development) 
and benchmarking. 

Electrical verification relies more extensively on debug hard 
ware because failures cannot be reproduced in our software 
model of the ( PI ". Engineers working to verify a chip's ther- 
mal and electrical margins use debug features to investigate 
and understand failures occurring at extreme operating 
points. 

Debug Features 

The PA 7:f00I.< ' debug features are intended to work in any 
environment used to lest the CPU — wafer lest, package lest, 
anil system test. The debug features an 1 operable and porta- 
ble across these environments. In addition, debug circuits 
were designed to lighter specifications than the rest of the 
PA 7300LC. This ensured that they functioned properly well 
into the Operating regions where Ihe ( PI ' core is expected 
lo fail. We achieved this through Ihe use of simple logic and 
conservative liming budgets. 

Although no major problems were found during qualification 
of the PA 7300LC, debug features were relied upon to help 
fix (he problems that arose, helping us to achieve quick time 
to market for PA 7300LC-based systems. 

Conclusion 

The extensive verification of the PA 7300 LC design was 
based on the successful strategy used for the PA 7100LC. 
Improvements were made in the model building process and 
in the extensive use of random code generation in the presi- 
licon and postsilicon phases. Many features were added to 
Ihe PA 73G0LC design to allow efficient debugging of post- 
silicon failures. Together, these efforts ensure that customers 
gel the highest quality pail possible. 
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An Entry-Level Server with Multiple 
Performance Points 



To address the very intense, high-volume environment of departmental 
and branch computing, the system design for the D-class server was 
made flexible enough to offer many price and performance features at 
its introduction and still allow new features and upgrades to be added 
quickly. 

by Lin A. Nease, Kirk M. Bresniker. Charles J. Zacky, Michael J. Greenside, and Alisa Sandoval 



As the computer industry continues i<> mature, system sup- 
pliers will continue to find more creative ways to meet 
growing customer expectations. The IIP MitOO Series 800 
D-class Server, a new r low-end system platform front HP, 
represents a radically different approach to system design 
than any of its predecessors (see Pig. I). 

The Series 800 D-elass server comes at a time when server 
systems priced helow I'.S. S20.000 are at a crossroads. 
Commodity technologies are upsi/.ing. enterprise customers 
are enjoying choices of product families that offer thousands 
Of applications, open computing and networking have blurred 
the distinctions between competitors" offerings, and finally, 
indirect marketing and integration channels can offer com- 
pelling value bundles that were once the exclusive domain 
of big, direct-marketing computer system suppliers. These 
trends have created an environment of very intense, high- 
volume competition for control of departmental and branch 
computing. 

To address this environment, the System design for the D- 
class server had to he flexible enough to offer many price 
and performance features at its introduction and still allow 
the addition of new features and upgrades to the system 
quickly. The Server's competitive spare, being broad and 
hetefOgeneOUS, also demanded that the system be able to 
accommodate technologies originally designed for other 
products, including technologies from systems thai COS) 
inure than the I )-class server. 

System Partitioning Design 

In designing the D-class enlry-lcvcl servers, one of the pri- 
mary goals was to create a new family of servers that could 
be introduced with multiple performance points without 
any investment in new VLSI ASICs. The servers also had 
to be capable of supporting new advances in processors 
and memory and I/O subsystems with minimal system 
reengmcering. 

The server family would covers span of performance points 

thai had previously been covered by several classes of serv- 
ers Which, w hile they were all binary compatible vvilh the 
PA-RISC architect m e. had very different physical implemen- 
tations; The lower-perforiiiance-poinl designs would be 



drawn from uniprocessor-only PA TlOOl.C-based K-class 
systems. The upper-pcrfonuaiicc-poinl designs would be 
drawn from the one-to-four-way multiprocessor PA 7200- 
based K-class systems. The physical designs of these systems 
varied widely in many aspects. Issues such as 6>0V versus 
•C!Y logic, a single system clock versus separate cloc ks 
for l/( ) and processors, and whether l/< ) devices would be 
located on the processor memory bus or on a separate I/O 
bus had to be resolved before the existing designs could be 
repartiiioned into compatible physical and logical subsys- 
tems. Tables I and II list the key performance points for the 
IIP initio E-class. K-class. and D-class servers. 




Fig. 1. The D-elnss server system cabinet The dimensions of tins 
cabinet are in 2 in (28 om) wide, EKJifi m (60.4 cmj high, and 22.2 in 
(5{EM cm) deep; 
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Table I 



Key Performance Points for HP 9000 K- and E-Class Servers 







Hlnrk Snpprf 


Multinrnrp^^nr 

IVIUItipiUOCdoUl 


Mpmnru 


Parhp 


HP-PR 
nr rD 


HP-HSr 
nr nob 


fliols ranariitii 
UlaK OdfJdUliy 


Model 


Processors 


(MHz) 


Configuration 


(M Bytes) 


(l/D K Bytes) 


1/0 Slots 


1/0 Slots 


(G Bytes) 


K2x0 


PA 7200 


120 


l-to-4-Way 


64 to 2048 


64 to 256 


4 


1 


3800 




PA 8000 


160 


l-t.o-4-Way 


128 to 2048 


256 


4 


1 


3800 


K4x0 


PA 7200 


120 


l-t.o-4-Way 


128 to 3840 


256 to 1024 


8 


8 


8300 




PA 8000 


160 to 180 


l-to-4-Way 


128 to 4096 


1000 


5 


5 


8300 


Ex5 


PA 7100LC 


48 to 96 


1-Way 


16 to 512 


64 to 1024 


4 


4 


144 



Table II 

Key Performance Points for HP 9000 D-Class Servers* 







Clock Speed 


Multiprocessor 


Memory 


Cache 


HP-HSC 


EISA 


Model 


Processors 


(MHz) 


Configuration 


(M Bytes) 


(l/D K Bytes) 


1/0 Slots 


1/0 Slots 


D2x0 


PA 7100LC! 


75 to 100 


1-Way 


32 to 512 


256 


4 


4 




PA 7300LC 


132 to 160 


1-Way 


32 to 1024 


64** 


4 


4 




PA 7200 


100 to 120 


l-to-2-Way 


32 to 1536 


256 to 1024 


4 


4 




PA 8000 


160 


l-ro-2-Way 


64 to 1536 


512 


4 


4 


D3x0 


PA 7100LC 


100 


1-Way 


32 to 512 


256 


4 


7 




PA 7300LC 


132 to 160 


1-Way 


32 to 1024 


64 to 256 ** 


4 


7 




PA 7200 


100 to 120 


l-to-2-Way 


32 to 1536 


256 to 1021 


5 


7 




PA 8000 


160 


l-to-2-Way 


64 to 1536 


512 


5 


7 



* HP-PB slots = 0 and disk capacity = 5 Tbyles. 
' 1M bytes at second-level cache. 

Additional constraints on the design were a direct result of 
competitive pressures. As the presence of Industry Standard 
Architecture-based systems has grown in the entry-level 
server space, the features they offer became D-class require- 
ments. These requirements include support for EISA 
(Extended Industry Standard Architecture) I/O cards and an 
increase in the standard warranty period to one year. Both 
of these requirements were new to the Series 800. Also new 
to the Series 800 was the desire to design a system enabled 
for distribution through the same type of independent dis- 
tribtilion channels used by other server vendors. Add to 
these constraints the cost sensitivity of products in this 
price range, and we have a system that uses as many indus- 
try-standard components as possible, is extremely reliable, 
and is capable of being assembled by distributors, all with- 
out compromising any performance benefits of current or 
future PA-RISC processors. 

Feature List. The first step in the process of partitioning I he 
system was lo detail all possible features that might be 
desired in an entry-level server. This list was compiled by 
pulling features from our development partners' requirements 
analysis and from knowledge of our competitors' systems. 
( )nce tlus feature list was developed, each feamre was evalu- 
ated against all of our design goals (see Fig. 2). Each feature 
was then ranked in terms of its relative need ( must, high 
want, want ) and technical difficulty (high, medium, low). 
Determining the possible feature list was the first goal of the 
partitioning process: the list was continually updated during 
the entire process. 

Once the initial feature list was created, a small design team 
consisting of a mechanical engineer, an electrical engineer, a 
firmware engineer, a system architect, and a system manager 



began analyzing the list to see how each feature would affect 
the physical partitioning of the system. The goal of this 
process was to generate a fully partitioned mock-up of the 
physical system. Successive passes through the feature list 
led to successive generations of possible designs. With each 
generation, the list was reevaluated to determine which fea- 
tures could be achieved and which features could not. 

Physical Partition. After the first few generations it became 
clear that a few critical features would drive the overall 
physical partitioning. The physical dimensions would be 
determined by the dimensions of a few key subsystems: the 
disk array and removable media, the integrated I/O connec- 
tors, the I/O card cage, and the power supply (see Fig. 3). 
All of these components were highly leveraged from existing 
designs, like the hot-swap disk array module developed by 
HP's Disk Memory Division, and the industry-standard 



Feature 


Need 


Difficulty 


Front panel display 


Must 


Low 


HP 100LX Palmtop front-panel display 


Want 


High 


Two-line LCD display 


High Want 


Medium 


16-character display 


Must 


Low 


Backlit display 


Want 


Medium 


Hexadecimal status mlormation 


Must 


Low 


English status information 


High Want 


Medium 


Localized language status information 


Want 


High 


Display suspected failure causes after faults 


Must 


Medium 


Include power-on LED indicator 


Must 


Low 


Include disk-access LED indicator 


Want 


Medium 


Include blinking "Boot in progress" indicator 


Want 


Medium 


Reset switch 


High Want 


Medium 


Reset switch with key and lock 


Want 


High 



Fig. 2. The feature, list for the D-class server's front-panel display. 
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Airflow 
Direction 



Eight I/O Slots 



Processor Carrier 



Hinged Door 



Power Supply 




Power Switch Mount 
Display Mount 



Airflow 
Direction 



Mini-Flexible Oisc 



Removable Media Slots 
(Two 1/2-inch-High Devices! 



Disk Cavity 
(Hot Swap Optional} 



Fig. 3. The chassis for the Series 800 D-class server. The features that determined the overall size of this chassis were the disk cavity, the 
removable media slots, the power supply, I/O slots, and processors. 



form-factor power supply. The first major design conflict 
arose when we realized I hat these components could not be 
integrated in a package short enough to fil under a standard 
office desk, and yet narrow enough lo allow two units to be 
racked next to each other in a Standard rack. Numerous 
attempts 10 resolve these two conflicting demands only 
succeeded in creating a system thai would violate our cost 
goals or require more new invention than our schedule 
WOllld allow. 

In the end, it was deiennined thai the desk-height require- 
ment and cost goals were more important than Hie rack 
requirement, so the package was shortened and widened to 
accommodate all the critical components in a package that 
would fit under a standard office desk. Once this was de- 
cided, the system mock-up came together quickly, and the 
second goal was reached. The syslem partitioning shown 
in Fig. 3 provides several benefits necessary to achieve our 
goals. The standard PC-type power supply helped us to 
achieve new lows in cost per wall of power. The division of 
the box into the lower core PC) and disk array volume and 
the upper expansion PO slot and processor area helped to 
simplify the design of the forced-air cooling system since il 
separates the large interior volume of the box into two more 
manageable regions. 

Printed Circuit Assemblies. The next goal in the process was to 
repartition and integrate the disparate design sources' into 
logically and physically compatible printed circuit assemblies, 
while maintaining all of our design constraints on cost, 
expandability, and design for distribution. Again, a single 
crucial design decision helped to quickly partition (he syslem: 

' The design sources were parts Uom E-class and K-class servers and the J-class workstalions 
that were combined to form the 0 class servers 



the D-ciass would not use any high-speed, impedance-con- 
trolled connectors. This decision was made as a direct result 
of the K class development process and the success of the 
Series 800 G/H7I-class Model 60 and 70 systems. The K-class 
development process showed that although high-speed im- 
pedance-controlled connectors can add excellent flexibility 
and expandability to midrange systems, they require a great 
deal of mechanical and manufacturing infrastructure. 

The processor modules for the G/H/I-elass Models (50 and 70 
are the uniprocessor and dual-processor versions of the 
same board (see Table III). The same printed circuit board is 
loaded with cither one or two processors at the time of man- 
ufacture. To increase t he number of processors in a system, 
the entire processor module must be replaced wilh a new 
printed circuit assembly. Other systems, like the K-class 
servers, allow for the incremental increase of tiie number of 
processors in the system with just the addition of new pro- 
cessor modules. Everi though the board swap is a less desir- 
able upgrade path than an incremental upgrade, the success 
of the Models (50 and 70 systems led us to believe that it was 
quite acceptable to our customers. 

This decision simplified rcpartitioning the design sources, 
since it meant that the high-speed processor and memory 
clock domains and their data paths could remain on a single 
printed circuit assembly, while the moderate-speed PO do- 
main and its data paths could cross multiple printed circuit 
assemblies. It was determined that both the dual PO bus 
architecture of the K-class and the single PO bus of the E- 
class would be supported in the system. To do this, the con- 
nector technology used in the D-class is modular, allowing 
the designs to load only (hose portions of the connector that 
are supported. 
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Table III 

Key Performance Points for the HP 9000 Series 800 
G/H/l-class Model 60 and 70 Systems* 
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Tliis lowers both the material and assembly costs. To further 
lower the material cost, the PA 7100LC-based processor and 
memory module is fabricated on a smaller printed circuit 
board than the PA 7200-based processor module. The differ- 
ence in the size of the modules is accommodated by attach- 
ing them to a sheet-metal carrier that adapts the modules to 
a common set of card guides. Not only is the sheet metal 
cheaper than the corresponding printed circuit material 
would have been, but it is also stronger and easier to insert 
and remove. 

A secondary benefit of this strategy is that it allows new 
investment to be made as needed. Historically. I/( ) subsys- 
tems and technology are much longer-lived than processor 
and memory technologies. The partitioning strategy we used 
helped to decouple the I/O subsystem from the processor 
and memory. As long as they remain consistent with the 
defined interface, processor modules are free to exploit any 



technology or adapt any design desirable. This also enables 
D-class servers to excel in meeting a new and growing re- 
i|Uiremenl— design for reuse. A customer is able to Upgrade 
through many performance points simply by changing pro- 
cessor modules. As some countries are investigating forcing 
manufacturers to accept and recycle old equipment, keeping 
the return stream as small as possible is highly desirable, 

Once the printed circuit assembly board outlines were com- 
plete, the process of adapting the various design sources 
to the new partitioning was time-consuming, but relatively 
straightforward. Fig. I shows the the various design sources 
that were pulled together to form the PA 7200-based proces- 
sor module. As portions of designs were merged, altered, and 
reconibincd, the possibility of transcription errors grew. The 
original designs were executed by three different labs and 
many different design teams. All designs were fully func- 
tional as designed, but we were extending designs as well as 
integrating them. In an effort to minimize the possibility of 
errors being introduced during the adaptation process, the 
schematic interconnect list was extracted and translated 
into a simulation model. This model was then added to the 
models used to verify the original designs to ensure that no 
new errors had been introduced. 

System Partitioning and Firmware Design 

Because of the partitioning scheme used for the I)-class 
entry-level servers, the firmware design was a critical factor 
U1 achieving the overall program objective of low cost. The 
firmware design addressed cost issues in support of the man- 
ufacturing process, field support, and the upgrade strategy. 
In addition, although the underlying hardware is dramati- 
cally different depending upon which processor module is 
installed in the system, from the customer's perspective, the 
external behavior of each performance point should be the 



PA 7200 Module 



K-Class Memory Extender 




K-Class Motherboard 



Fig. 4. Thp various design sources 
that were pulled together CO form 
the PA 7200 D-class module 
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same. For the firmware design team, [hat meant tltat regard- 
less of die underlying hardware, the entire D-class had to 
have die look and feel of a single product. 

D-Class Subsystems From a firmware perspective, the D-class 
is partition) -<! into two subsystems: the system board and the 
processor modules (see Fig 5). The system board contains 
all of the I/O residing on or hanging off the HSC (high-speed 
system connect ) bus. This includes optional I/O modules 
that plug into the HP-HSC slots, such as the fast-wide SCSI 
and graphics cards li includes core I/O built into the system 
board, which provides serial interfaces, single-ended SCSI. 
Ethernet LAN, a parallel interface, a mouse and keyboard 
interface, and a flexible-disk interface. The EISA bus. which 
is connected at one end to the HSC bus. is also found on the 
system board. The Access Port/MUX card." which contains 
its own HSC-to-HP-PB I/O bus converter, also plugs into an 
HSC slot. In addition to these I/O buses and devices, there 
is an EEPROM and two hardware dependent registers that 
hold I/O configuration information, This nonvolatile memory 
and the configuration registers are critical to the partition- 
ing and upgrade strategy for the D-class server. 

1 HP Access Port is a tool lor providing remote support for HP servers. 



The core of each processor module houses (he CPl\ instruc- 
tion and data caches, and memory subsystems. Also on the 
processor board is an EEPROM and two more hardware 
dependent registers. The PA 7100LC uses the HSC as its 
native bus. so its connection to the system board is rela- 
tively straightforward. However, the PA 7200 requires a bus 
converter between its native bus and the HP-HSC bus. Thus, 
to have one system board common between the two proces- 
sor modules, the PA 7200 processor board was burdened 
with carrying the bus converter circuitry. One other signifi- 
cant difference exists between the two processor modules, 
scratch RAM.** The inclusion of 32K bytes of static RAM on 
the PA 7200 module meant that system variables and a stack 
could be set up very early in the boot process. The lack of 
this scratch RAM on the PA 7100LC limited the amount of 
code that could be common between the two platforms. 

Consistent Look and Feel The goal of having the same look 
and feel for the entire produc t line could be met by having 
one common code base for all performance points, but 
because of the significant hardware differences mentioned 
above, this was not possible. The primary differences he in 

' The PA 7300LC and the PA 8000 also include scratch RAM 
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Fig. 5. A firmware pcrspiMlive of the D-class subsystems. 
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the processors themselves. There is no commonality in Ihe 
control and stains registers of the two processors, caches 
are accessed differently, and the memory subsystems are 
too different to share code. These differences; along with 
other hardware incompatibilities, meant thai each processor 
module needed to have ils own separate anil distinct code 
base. However, because the primary differences between 
the two PA 7100LC processor versions (75 MHz or NMI MHz) 
and the two I 'A 7200 processor modules arc- processor speed 
and cache size, all performance points that use a common 
CPU can be supported by one common code base. 

The three areas needed to give Ihe product fine :i consistent 
look and feel included a common feature set, similar strate- 
gies for handling and reporting errors, and a common user 
interface. To ensure consistency in this regard, one engineer 
was given responsibility for the same functional area of each 
platform. For example, the engineer who worked on memory 
code for Ihe PA 7100LG also had responsibility for the mem- 
ory code on Ihe PA 7200 platform. Taking advantage of this 
synergy paid off especially well in Ihe design and implemen- 
tation of the user interface, where differences between plat- 
forms could easily lead to confusion. 

System Configuration. Partitioning the EEPRI >Ms between the 
processor board and the system board is a key enabler of Ihe 
upgrade Strategy. Since an upgrade consists of replacing the 
processor module, I/O configuration information must remain 
with Ihe system board. The EISA configuration, graphics 
monitor types, and LAN MAC address are stored on the sys- 
tem board. Additional control information is used to check 
for consistency between Ihe two EEPROMs. The firmware 
expects the formal of Ihe system board EEPROM to be ihe 
same, regardless of which processor module is installed. 
With all I/O configuration information and control variables 
in the same location and sharing Ihe same set of values, pro- 
cessor modules can be freely swapped without changing Ihe 
I/O configuration. 

Dynamic configuration of Ihe system is used to support the 
upgrade strategy, the manufacturing process, and field sup- 
port. When a D-class server is powered on. state variables in 
Ihe processor module's and system board's nonvolatile mem- 
ories are tested for a value thai indicates whether or not they 
have been initialized and configured. If they fail this lest 
(which is always Ihe ease for initial turn-on during the man- 
ufacturing process) the system's hardware configuration is 
analyzed and the corresponding stale and control variables 
are sel. Much of this information is available via Ihe hard- 
ware dependent registers located on each hoard. The proces- 
sor frequency, system board type, and other details concern- 
ing Ihe I/O configuration can be read from these registers. 

The stale variables, which are sel as a result of examining 
Ihe hardware configuration, include the system's model 
identifier (e.g., HP0000/SSI 1/D:110), hardware version 
(HVERSION), and paths to the boot and console devices. 
The boot path can be either the built-in single-ended SCSI 
device or a hot-swappable fast-wide SCSI device. The firm- 
ware checks for the presence of a hot-swappable device 
which, if present, becomes Ihe default bool path. ( ttherwise 
a single-ended SCSI device is configured as ihe default boot 
path. Tin- actual hardware configuration is also examined lo 
selecl an appropriate console path. The default console path 
can be either the built-in serial port, the HSC Bus Access 



Pori/MCX card, or a graphics console. Depending upon the 
presence of these devices and their configurations (e.g., a 
graphics device must also have a keyboard attached), a con- 
sole path is selected according to rules worked out in coop- 
eration with the manufacturing and support organizations. 

The same sequence of events occurs when upgrading or 
replacing a processor module. In this case, the system board 
is already initialized and only lite processor module requires 
configuration. On every bool. information such as Ihe model 
Identifier is checked against Ihe actual hardware Configura- 
tion and any mismatch will invoke Ihe appropriate configu- 
ration actions. Likewise, because some information is kept 
redundantly between the processor module and the system 
board, they can be checked for a mismatch. This redundancy 
means thai the system board can also be replaced in the 
field with a minimal amount of manual reconfiguration 
Because a D-class server can consist of any combination 
of I wo system boards and sev eral different processor mod- 
ules, and because further enhancements will double the 
number of processor modules and include I wo new ( PI Is, 
dynamic configuration has obviated the expense of develop- 
ing external configuration tools, reduced the complexity of 
the manufacturing process, and simplified field repairs and 
upgrades. 

System Packaging 

Mechanical packaging is one of Ihe key variables in maintain- 
ing a competitive edge in the server market. The challenges 
involved in the system package design for Ihe IIP OOOO D- 
e I ass server included industrial design, inanufaclurabiliiy. 
EMI containment, thermal cooling, and acoustics, while 
having Ihe design focus on low cost. 

The D-class low-cosl model was based on the high-volume 
personal computer market. However, unlike personal com- 
puters, server products must support multiple configurations 
with an easy upgrade path and high availability. This meant 
that tin' D-class package design had to be a highly versatile, 
vertical lower with the ability to be rack-mounted in a stan- 
dard EIA rack (see Fig. 6). Il allows for multiple processors 
and power supplies, and can support up to eight I/O slots for 
EISA and GSC cards. The design also supports up lo five hot- 
swappable or two single-ended disk drives and I wo single- 
ended removable media devices with one IDE (Integrated 
Device Electronics) mini-flexible disk (Fig. 7). This diversifi- 
cation provides the entry-level customer with a wide range 
of configurations at various price/performance points. 

Manufacturing and Field Engineering Support. ( Concurrent engi- 
neering was a key contributor to the design for assembly 
(DFA) and design for manufacturing ( DFM ) successes of the 
D-class server. Since we are very customer focused, we lake 
the disassembly and repair of the unit .just as seriously as the 
manufaciurabilily of Ihe product. The D-class mechanical 
leant worked closely with key partner throughout the pro- 
gram lo ensure Ihe following assembly and manufacturing 
features: 

• A single-build orientation (common assemblies) 

• Multiple snap-in features 

• Slotted T-l- r i Torx fasteners (Torx fasteners are used for 
IIP manufacturing, and the slot is for customers and field 
engineers. ) 

• System board that slides into the chassis 
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Fig. 6. A front and back view of the D-class server. 



• Quick access to all components 

• Manufacturing line for high-volume production. 

EMI. Design for EMI containment was a considerable chal- 
lenge for the D-class server program. The package goal was 
to contain clock rates up to 200 MHz. This required a robust 
system design and two new designs for the EISA bulkheads 
and core I/O gaskets. 

The system EMI design is based on a riveted sheet-metal 
chassis using a slot-and-tab methodology for optimum man- 
ufacturability. A cosmetic outer cover with a hinged door 
completes the EMI structure. EMI is contained using contin- 
uous .seams and EMI gaskets with small hole patterns for 
airflow. 

The EISA bulkhead gaskets required a new EMI design. The 
new design is a slotted pyramid that forms a lateral spring 
element with a low deflection force but a high Contact force 




Mini-Flexible Disk 
Removable Media 

Hoi Swap Disks 15) 



Fig. 7. Tin- l)-< iass cabinet vviih i he door removed, 



(see Fig. 8). The new design for the I/O gasket includes a 
foam core wrapped with a nickel-over-copper fabric, which 
provides a 360-degree contact around each connector (see 
Fig. 9). These new designs produced excellent results. 

Thermal Design and Airflow. The thermal design for the D-class 
server also had some interesting challenges. The design 
strategy had to encompass multiple configurations and 
multiple processor chips and boards. Some of these options 
were in development, but most were future plans. A thermal 
analysis program, Flotherm. was used to develop the thermal 
solution for the system. 

The Flotherm models and tests resulted in the package being 
separated into two main compartments. The top half, which 
includes the I/O and the processors (see Fig. 3), is cooled by 
a 12-mm tubeaxial fan. The processor chips are located side- 
by-side directly behind the front fan, giving an approximate 
air velocity to the processors of about 2.5 m/s. Heat sinks 
arc used for processor chips that consume under 25 watts. 
For chips over 25 watts, a fan mounted in a spiral heat sink 
is used. 

The bottom half of the package includes the peripheral bay 
and power supply and components. Il is cooled using a 
I20-mm tubeaxial fan. However, when the hot-swap disks 
are in use, a separate cooling system is installed. The hot- 
swap bay is a sealed subsystem that uses a small blower to 
pull air through the disks. Any disk can be pulled out and 
the airflow to the other disks remains relatively unchanged. 
The power supply has its own 92-mm fan. 

Acoustics. The acoustical goal for the D-class server was 
designed to be 5.4 bels at the low end. which was the same 
as for earlier server products. This package has a higher 
power density than previous products, more versatility, 
higher-speed discs, and an off-the-shelf power supply rather 
than a custom one. The fan in the power supply ended 
up being the loudest component of the system. Still, the 
system Game in at 5.1 bels al the low end by custom Inning 
Sheet-metal parts, baffles, and Ian speeds. 
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Fig. 8. The EISA bulkhead gaskels 
for t he D-olass server. 



High Availability and Ease of Use 

Hot-Plug Internal Disks. An important feature Ihai lite D-olass 
server has brought to the Series 800 product line is hot- 
pluggable internal disk drives. While commodity servers had 
once provided forms of internal hot-pluggable disks, these 
solutions were deemed too likely to cause data corruption 
for use in Series 800 systems. For example, the commodity 
hot -plug solutions evaluated for use in Series 800 systems 
had the issue of "w indows of vulnerability" in which sliding 
contacts from a swapping disk on a SCSI bus could cause 
data bits to actually change after parity had already been 
driven, causing undetected data errors. With a low probabil- 
ity of corruption, this approach may have made sense for a 
workgroup file or print server. However, Series 800 servers 
are expected to operate in mission-critical environments. 
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Fig. 9. A comparison between the new and old core I/O gasket 
designs. The new design provides a .360-degree contact around 
each connector. 



Therefore, a more robust approach to hot-swapping disk 
drives had to be developed for these products. 

The approach used for the D-class server was lo provide a 
hot-swap solution using logical volume management (LVM)* 
and mirroring middleware facilities, and to offer a disk-drive 
carrier common lo the standalone enclosure disk carrier 
being separately developed for the rest of the Series 800 
product line. The common carrier approach would allow 
I he field to learn only one solution and guarantee a higher 
volume of parts. In addition, solutions for the data corrup- 
tion problem, the use of sequenced SCSI resets, and an auto- 
mated swap script could be shared by both the enclosure 
team and the D-class server team. 

Mission-Critical ServiceGuard and EISA 1/0. HP's MC/Service- 
Guard product (portions of which were previously called 
Switchover/UX) has been an unofficial standard in the 
application-server industry for several years now. This 
middleware product allows Series 800 systems to operate as 
standby servers for one another, transferring mission-crit ical 
workloads from a failed system to its standby. This feature 
requires that a system and its standby host transactions on 
the same mass storage buses, enabling the standby system 
to have access to all of the primary system's data. Multiple 
hosting (known as multi-initiator) on mass-storage intercon- 
nects requires significant design attention, hi addition, the 
LANs that allow these systems to communicate with one 
another must offer special capabilities in the areas of error 
handling and reporting. 

A significant challenge for the D-class design was merging 
the workstation, PC", and I/O infrastructures with the Series 
800 infrastructure. This requires supporting MC/Service- 
Guard capabilities, higher slot counts, more extensive error- 
handling, and a remote support infrastructure. Before the 
D-class server, all Series 800 systems that supported MC7 
ServiceGuard were HP Precision Bus (HP-PB) based. 

' LVM is software that allows the number of file systems to be relatively independent ol the 
number of physical devices. One tile system can be spread over many devices. or one device 
can have many file systems 
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The D-class server's use of EISA and HP-HSC I K ith Ilie EISA 
form factor) required the leant to implement, debug and 
verify MCServicetiuard functionality on these new I/O 
buses. In the process, various I/O implementations had 
to he modified, such as special EISA bus operating modes. 
SCSI adapter functionality, periodic EISA cleanups, guaran- 
teed arbitration for core LAN. queueing Impactions on HP- 
HSC extra buffering of data signals, and slot configurations. 
These modifications made the HP-HSC and EISA I/O infra- 
structure more robust in a highly available departmental 
branch server. 

In addition, HP's Access Port product, which is used for 
remote support of IIP servers, only existed on the HP-PB. 
W ithout HP-PB slots, the D-class server would have had to 
either forego remote support or develop a new card. The 
answer was not to develop a new card, but to leverage exist- 
ing logic by supplying a buried HP-PB on Ihe new Access 
Poll card. To the user, (he D-class server's Access Port is 
an HP-HSC card. However, to the operating system and 
response center engineer, the D-class seiver's Access Port 
looks like ihe familiar, compaiible HP-PB card found on all 
the Other server products. 

Pushhutton Graceful Power Shutdown 

The D-class server is (he firsi Series HIHI system to offer push- 
button graceful power shutdown. Basically when a D-class 
system is up and running. Ihe power button is equivalent to 
the command reboot -h. which causes the system to synchro- 
nize its buffer cache and gracefully shut down. This feature 
is most useful in a branch office or department where the 
serVer is minimally managed by local personnel. Single-user 
IIP workstations had introduced this feature to PA-JRISC 
systems, 

Built-in Remote Management 

With .in emphasis on remote server management, the D- 

class server team decided to offer the same robust, world- 
wide-usable internal modem as the f&series products. This 
modem offers support for transfer tales well beyond those 

used by today's hp response centers and is integrated with 
the remote support assembly in D-class systems, The prod 

net also offers a special serial port for conlrolling opl tonal 

uninterruptible power supplies i 1 IPS), as well as pinout defi- 
nitions for future direct control of Internal power-supply 

signals. 

The D-class server team also accommodated "consoleless" 
systems, whereby a I ( class server can be completely man- 
aged remotely without a local c onsole al all. In addition, 



graphics console customers can still take advantage of 
remote console mirroring (formerly reserved strictly for 
RS-232 consoles) by merely flipping a switch on Ihe product. 

Conclusion 

The system partitioning design for the first release of the 
iK lass Sewers helped to achieve all of our introduction 
goals. We were able lo introduce both PA TlOOLC-based and 
PA 7200-biised processor modules, integrate Ihe industry" 
standard EISA DO bus into the Series 800 hardware for the 
first time, and achieve our cost and schedule goals without 
;ui> investment in new VLSI ASICs. 

In the end. Ihe D-class design had leveraged from all currenl 
entry-level and midrange Series 800 servers and many Series 
7(10 worksialions. Because of ihe care taken during the 
adaptation process, performance enhancements made to 
the original design sources were made available in the latest 
D-class module quickly and with very little investment As 
an example, only two weeks after a larger-cache, higher- 
speed PA 7200 K-series processor module was released, the 
corresponding D-class PA 7200 processor module had been 
modified and released for prototype. This moduli' prov ides 
four times the cache and a 2096 increase in frequency over 
the Initial D-class PA 72(H) module. 

Table IV summarizes the leverage sources for the various 
subsystems thai make up the D-class servers. 



Table IV 

Leverage Sources lor the D-Class Subsystems 
D-Class Subsystem Leverage Source 

EISA l/< ) IIP 9000 Series 700 Worksialions 

HP-HSC I/O K-class serv er and .l-class and 

IIP 9000 Series 700 Worksialions 

( 'locking E-class anil K -class Servers 

PA 7100LC Processor K class Server 

Ml "lilies 

PA 7200 and PA 8000 K-class Server 
Processor Modules 

PA 7'iOOLC Processor C-class Server 
Modules 

Power Supply Industry-Standard Suppliers 

Hot Swap Disk Array HP's Disk Memory Division 
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A Low-Cost Workstation with 
Enhanced Performance and I/O 
Capabilities 

Various entities involved in product development came together at 
different times to solve a design problem, evaluate costs, and make 
adjustments to their own projects to accommodate the cost and 
performance goals of the low-cost HP 9000 B-class workstation. 

by Scott P. Allan, Bruce P. Berginann, Ronald P. Dean, Dianne Jiang, and Dennis L. Floyd 



The design and development of I ho IIP OOOO B-class work- 
station is a good example of cooperative engineering. 
In cooperative engineering, the various entities involved in 
product development come together at different times to 
solve a problem or make adjustments to their own projects 
to accommodate a common need. Examples of this co- 
operation for I he B-class workstation include coordination 
between system designers and firmware developers, the 
addition of new functionality without impacting the develop- 
ment schedule, close ties with manufacturing, evaluation of 
implementation based on detailed cost models, and simplifi- 
cation of the PA 7300LC design by moving clocking functions 
onto a small chip on the system board. 

Design Objectives 

The design objectives for the B-class workstation were low 
cost, quick time to market, performance, functionality, lon- 
gevity, and modularity. In addition to these objectives, the 
development team's main goal was to produce a workstation 
based on the PA 7300LG processor that would bo compar- 
ably priced to the HP 9000 Model 715 workstation, but with 
Superior performance and I/O capabilities. This goal and the 
design objectives remained the same throughout the project. 

With low cost as the primary objective, any feature that was 
perceived as too costly or of limited value to our customer 
base was not included. Leveraged subsystems were reviewed 
in search of creative ways to reduce cost. This led to reduc- 
tions in the cost of the clock circuitry and firmware inter- 
face and elimination of some legacy I/O interfaces. From a 
cost/performance perspective we were able to justify the 
addition of a PCI (Peripheral Component Interconnect) bus. 
a higher-speed memory technology, a second-level cache, 
and a higher-performance processor and graphics subsys- 
tem. Fig. 1 shows the B-class workstation. 

Features and Capabilities 

Based on the objectives for the B-class workstation, the 
following features are included in the product: 
PA 7300LC high-performance, low-cost microprocessor with 
two on-chip associative caches with OIK bytes for data and 
64K bytes for instructions 

LM bytes of ECC (error-correcting code) directly mapped 
second-level cache for additional performance 



• IIP VISI AIJZK graphics leeluiology front IIP VISCALIZE-EG 
(entry-level graphics) 

• HP VlSl'AUZE-lVX graphics on the B132 workstation 
(optional ) 

• Six memory slots that support up to 708M bytes of ECC 
memory, including East-page mode (FFM) and extended- 
data-out (EDO) DRAM dual inline memory modules 
(DIMMs) 

• General system connect (GSC) bus for high-speed I/O 
bandwidth 

• Flexible I/( > that includes t wo I/O slots, which can be 
configured as: 

Two PCI slots 
Two GSC slots 
One EISA slot 

• Optional fast-wide SCSI (2()-Mbyte/s) card that supports 
internal and external disks without using an I/O slot. 




Fig. t. The E&clasa workstation 
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In addition to those features, the B-class workstation's mod- 
ular design provides simple installation, flexibility of use. 
and easy servicing. This is accomplished through design 
features such as: 
Simple tray design 
Iluilt-in expandability 
Plug-in memory modules. 

Fig. 2 shows a bloek diagram of the components thai make 
up the B-class workstation. 

Processor and System Design 

Since the processor chip used in Che B-class products is 
the PA 7300LC, one of the main areas of cooperation was 
between the PA 7300LC processor design team and the 
B-class system design team. 

The previous-generation processor used in IIP workstations 
of a comparable price was the PA 7100LC. Tlie PA 7100LC 
was an extremely versatile processor, and many of its 
best points were leveraged into the PA 7300LC design 
(see the articles on pages 43, 4S. (il. and (ill). However, 
the PA 7100LC was not without its challenges, such as 
Che difficulty in synchronizing the processor clock with 
the GSC (general system connect J bits. 

Clock Frequency 

The QSC bus is a general-purpose synchronous bus used to 
communicate between the processor and I/O. Its phase is 
determined in relation to a nonexistent GSC clock. This 
imaginary clock runs at half the frequency of the clock sync- 
signals driven to each BSC device. Its rising edge is defined 
by the rising edge of reset during initialization, and each 

GSC device is responsible for keeping track of the current 
phase of the QSC clock stalling from initialization. 

t >n che I'A Tiooi.t '. the ( JSC bus was only permitted to oper- 
ate at fixed ratios of the processor clock frequency, including 
some odd clock ratios such as 1.5:1 (see Fig. All of the 
clock syncs and the resets used to initialize Che QSC clock 
were external to the chip. Designing circuitry to maintain 
these ratios and timing margins with minimal clock skew and 
noise immunity became increasingly problematic. In addition, 
every frequency point of operation required a special clock 
design lo ensure maximum performance. This limited our 
ability lo select the frequency qf operation based upon yield 
at a later point in the design process. For the PA 7300LC. the 
Situation became more critical because the final processor 
frequency was slill uncertain, and the final ratio between the 
processor frequency ;uid the CISC clock was also undecided. 

'fhe first approach investigated was to bring the entire clock- 
ing solution into the PA 7300LC. It would be much easier lo 
adjust the delays and control the skew within an ASIC rather 
than in discrete circuits. The proposal was lo incorporate a 

PCLK = Processor Clock 

GSCSVNC = General System Conned Sync Signal 

Fif». 3. The I'A TllHiLC clocking sclienu- 



phase-locked loop circuii within die I'A 7300LC lo generate 
Che processor clocks from a low-frequency external crystal. 

The GSC syncs could then be crealed by dividing the 
phase-locked loop output internally in the PA 7300LC. The 
PA 7300LC would also drive out the reset used to initialize 
lite (JSC phase. Upon furl her investigation, the I'A 7300LC 
design leant became concerned about the risk associated 
with the phase-locked loop. The phase-locked loop was con- 
sidered a major component of the Eft 7300LC design. This 
was significanl because all post-fabrication verification and 
debugging of the chip would be dependent upon a functional 
phase-locked loop. 

At I his point, the B-class system designers and I he I'A 7300LC 
design leant began to look at a mixed solution. The phase- 
locked loop was scrapped CO avoid risk, and ils die area 
recovered for oilier uses. The I'A 7300LC would continue to 
drive the primary synchronizing reset lo eliininale the need 
to synchronize the asynchronous power-on reset lo the GSC's 
syncs. The generation of the syncs and the maintenance of 
their skew requirements would be moved to an external 
ASIC. Any necessary lums to a small ASIC would be quicker 
and less expensive, In addition, the clocking solution could 
be completely bypassed to allow continued verification and 
debugging of the I'A 7300LC if necessary. 

Working with Motorola, the PA 7300LC design team, and our 
materials organization, the system designers specified the 
device lltat became the Molorola MPC992 (the clock genera- 
tor in Fig. 2). This device uses a phase-locked loop and an 
external low-frequency crystal lo generate differential clocks 
thai provide clocking to the processor and Che other GSC 
devices. As an added benefit, its COS) is relatively low in 
relation lo the external clock oscillator and ECL devices 
used in previous products. The USYNC signal, which comes 
from the PA 7300L( ' processor, is the synchronizing signal 
that is responsible for aligning the GSCSYNC with the proces- 
sor clock signal. 

Memory and I/O Controller 

The proximity and working relationship between the 
PA 7300LC and B-class system design teams allowed us 
to conununicale design specifications wilh relative ease. 
This working environment allowed us lo view tile product 
as a whole rather than designing the system around an 
existing chip. 

The design of the memory and I/O controller (MIOC) was 
the first area affected by this arrangement. The I'A 7300LC is 
designed lo support optional second-level caches of different 
technologies and sizes. When the PA 730QLC chip design 
leant began investigating each of these second-level cache 
options, the B-class system designers were able lo cheek 
I he appropriateness of their; solution wilh I he design. One of 
the first decisions under litis arrangement was to make the 
second-level cache optional and locale il on a DIMM (dual 
inline memory module) on a separate board. This prov ides 
the B-class workstation with several benefits: 
Lower-performance systems are not burdened with the COS) 
or i he second-level cache. 

Systems wilh and without a second-level cache can share 
system boards, reducing development and verification lime. 
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The exact configuration of the second-level cache can be 
altered at a later date if market conditions warranted. 
Less space is required on the board, permitting a lower < >»t 
system hoard. 

It was important for the PA 730ULC design team to know 
that a DIMM solution was being considered since it would 
ha\ i- a big impact on the I/O pad design of the PA 7TJ00LC 

tMQthVI area of concern within the MK >l ' involved the 
impact of the expanded data bus on the PA 7360LC ( 144 bitsi 
compared to the PA TKHiLi | ~- Mis). This would require 
additional pins and incur additional packaging costs. The 
PA 7J100LC design team wanted to share the memory data 
bus with the second-level cache data bus to reduce the num- 
ber of external I/O pins. However, the additional load associ- 
ated with the memory would degrade the response of the 
second-level cac he. The PA 7300LC design team suggested 
WET switches, which could be dynamically opened arid 
closed to isolate the second-level cache from main memory. 

The Ii-class system designers were able to verify using 
FET switches in a system environment. However, the only 
dev ices available that met the enable/disable speed require- 
ments were S-bit devices. This was viewed as an unwieldy 
and expensive solution in the B-<iass system. Working with 
our malcrials organization and Texas Instruments, the B-class 
system designers were able to make minor specification 
changes to an existing 24-bit Texas Instruments pari lo im- 
prove this speed parameter and cut the quaniily and COSl of 
the FET switches significantly The B-class system designers 

verified the signal quality of the memory data and second- 
level cache dala of these devices in a system environment 

As the configuration of I he second-level cache solidified, the 
B-class system designers were able lo provide the PA 7-MiuLi' 
design team With specific information concerning Ihe elect ri- 

cal environment in which the PA 7300LC would be operating. 
Willi this information I hey were able lo run simulations of 
I heir l/< ) pad drivers operating within the actual system. 
This led lo some changes in their pad designs, eliminating 
potential problems later. 

Memory 

As with most projects, the PA T.'KMILC design team *fld ihe 
B-class system designers had their share of resource short- 
ages. One such issue involved Ihe memory family. The 
PA 7300L1 ' is designed to support both fast-page mode and 
exlended-data-oiil DKAMs. In faslpage mode, sequential 
data to driven from the DKAMs on successiv c column ad- 
dresses follow ing a single row address, ralher than requiring 
both the row- and column address lo be driven on each dala 
access. Extended-data-oiil DRAMs are an enhancement to 
fasl page mode DKAMs in which Ihe data remains valid until 
Ihe column address changes or a new column address si rube 
occurs, rather than becoming invalid when Ihe column 
address strobe disappears. This allows ;i longer lime period 
over which to latch incoming dala and saves processor 
slales in memory accesses. 

i nfortunately, resource conflicts and schedule constraints 

made il impossible foi the PA 7300LC design team lo verify 

functionality of the chip for both memory technologies. 



The PA 7300LC design team wanted to qualify the extended- 
data-out DltAM technology because it would provide a higher- 
performance memory technology. The B-class system design 
team w anted the fast-|iage mode DRAM technology qualified 
to be compatible across Ihe workstation family, rather than 
having a unique memory component for the B-class systems. 
The compromise solution was to have the PA 7:HXILC design 
team qualify the fast-page mode DRAM technology for first 
re lease At a later point in ihe design phase, the Ii-class 
system designers would qualify the operation of extended- 
data-out mode DRAMs to be introduced as a performance 
enhancement 

Data Capture 

Resource balancing was also evident in the development 
of a data capture board for the PA 7300LC. A dala capture 
board is a device that is attached to a system board and is 
used to observe the high-frequency signals between Ihe 
processor, second-level cache, and memory for debugging 
purposes. Since the B-class system designers were more 
familiar with board design tools and the board design envi- 
roiimenl, the B-class system design leant developed the dala 
capture board for debugging the PA 7-'!IHH.< '. 

Hardware and Firmware Trade-offs 

Design teams frequently look at trade-offs between optimiz- 
ing resources and meeting the goals of the team. For the 
B-class workstation, the hardware and firmware teams 
fostered a close working relationship, allowing trade-offs to 
be made On a broader scale. 

A most unusual bin significant outcome of Ibis close work- 
ing relationship was the development of an unplanned ASIC 
for interfacing lo the processor dependent hardware ( PD1I ). 
The PI >ll consists of components such as the boot K( )M, 
nonvolatile memory, and configuration registers. Although 
there was already a way lo conned to the PDII funciionaliiy 
through pari of Ihe cote l/t ) logic being leveraged from pre- 
v ions lower-end workstations, this interface did not provide 
the level of funciionaliiy thai w as implemented in the higher- 
end workstations. The firmware leant could stive significant 

resources by leveraging portjonsofcode from the < -< lass 
workstation and the higher-end members of the D-class 
server family Many of the basic i/< > and graphics functions 

were similar between these plal forms. llowevei, Ihe code 

leverage was predicated on having certain PDII Functionality 

that COUld not be prov ided with the low-end solution. In 

addition, the high-end solution provided superior debug ca- 
pabilities, These better debug capabilities were very attrac- 
tive tO help ensure a Speed) Startup ol the new PA 7300L£ 
processor, and hence help meet our lime-lo-markcl goals. 

The key capability missing from the PDII Interface used in 
previous lower-end workstations was Ihe ability to perforin 
word-wide Write accesses lo PDII devices. The PDII inter- 
face was optimized for reads, with only byte write capabili- 
ties prov ided. The new PDII ASIC added ihe word-write 
capability to support a scratch RAH. This seemingly inno* 

Cert scratch RAM Was key, because ill high-end workstation 
mile it is used as a slack in the early stages of Ihe bool pro- 
cess before main memory is iniliali/.cd. The scratch RAM is 
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also used for global information such as Cables of I/O and 
graphics configuration information. It would have been very 
difficult to leverage code with the word-write capability to a 
platform without this capability. 

The new I'DH ASIC also provided additional address decode 
and the appropriate flexibility in timing to allow the direct 
connection of a serial port into the PDH hardware. This 
direct connection to a serial port, in conjunction with the 
capabilities offered with the scratch HAM, allowed a debug- 
ger to be operational even with hardware that was minimally 
functional. This serial port aided code and hardware debug- 
ging by allowing the hardware status to be monitored and 
the hardware configuration to lie modified early in the boot 
process. 

The risk for the new Pill I ASIC was minimized by incorpo- 
rating it into system simulation efforts and by keeping the 
design focused on the needed functionality and disallow ing 
any unnecessary features. 

Product Definition 

The B-Class system was originally defined alongside the 
C-ciass workstations. The B-class system is essentially a 
smaller version of the ('-class workstation. Our original 
intention for the B-class implementation was to use the 
same modular philosophy of separate I/O. CP! . disk inter- 
face, and human interface subsystems used in the ('-class 
machines. However, when the time came to implement the 
B-class product, cost goals had become more important. 
When preliminary costs were evaluated, it became clear that 
we were not meeting the cost objectives with the existing 
prOdUfil definition. 

Many alternatives were generated and evaluated against 
product objectives. Finance and K&D reviewed their cost 
models to see where costs could be saved. Manufacturing 
reviewed the design alternatives for manufact inability and 
analyzed the supply chain for issues associated with parts 
procurement, assembly, material, and structure. Service was 
consulted to review serviceability anil warranty implications 
of the various options, as well as issues with potential future 
upgrade products. The result of this analysis was a single- 
board integrated computer (see Fig. 4). The design, which 



was initially spread out over four separate boards in the 
C-class system for the sake of modularity, was now inte- 
grated onto one system board. 

Single-Tray Concept 

Like the (" class workstation, the B-class workstation uses 
a tray concept. However, instead of two trays (one for (he 
disks and one for the boards), there is one tray that holds 
everything (see Fig. 5), For this reason, during the design 
phase it was important to consider keeping the weight 
down. Holes were added in the tray wherever possible to 
reduce the overall product weight. 

EMI 

The tray assembly slides into a metal can Wilh this approach, 
the EMI (electromagnetic interference) interface is limited 
to the perimeter of the rear panel. Once the tray is removed, 
there is easy access to the option boards, memory modules, 
second-level cache modules, optional fast-wide S( 'SI interface 
board, power supply, disk drives, speaker, fans, and the C'Pl" 
chip. The system board is accessible by removing the disk 
bay, which is secured by only one screw and a few cables. 

Hisk drives can be accessed without removing the disk bay 
from the main tray simply by removing the snap-on cover. 
The disks are mounted using plastic brackets so that they 
can be changed without tools (Fig. (>)• A fan was added to 
the bottom of the disk bay to provide enhanced disk cooling 
since successiv e generations of disks consume more power. 
Removing the backplane is slightly more difficult, requiring 
all modules to be removed first. 

Manufacturing 

Working wilh manufacturing included performing a supply 
chain analysis 1 as part of the total system cost analysis. 
Design efforts produced detailed material lists that were 
used lo determine an overall system cost. Several design 
scenarios were developed with mechanically exploded 
models and material lists. The cost model for the B-class 
workstation was not limited just to the material content of 
the product, but also included system interconnect costs, 
parts procurement costs, part placement costs, printed cir- 
cuit board electrical anil system functional testing costs, and 
system support costs. 
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Fig. 5. The main tray assembly. 

Manufacturing and field-support representatives were In- 
volved in defining the system for maiiufact inability, inv en- 
tory control, and configurability to reduce the system cost. 
Design scenarios were then evaluated against each design 
objective. 

Initial prototypes were assembled and disassembled by 
manufacturing personnel to provide a hands-on critique oi 
the designs. These inputs were fed back into the design in 
the early stages of development. 

Serviceability 

One of the challenges of this single-board solution was to 
make the system board accessible for service. We w anted to 
have the board slide in and out. but there were connectors 
and switches on both edges of the board. In addition, the 
connectors had to be accessible through the rear panel. 
To allow the board to slide into the package we added a 
small tray to the bottom of the system board thai could slide 




Ftg. 6. Tin- disk bay. 



along card guides. One of the design requests from service 
representatives was to lie able to service the system Iwiard 
without removing the rear panel. To accomplish this, the 
rear-panel connectors were recessed and a separate small 
bulkhead attached with a sliding EMI interface to the rear 
panel. This bulkhead remains attached to the hoard in a 
l»oard replacement. 

Another serviceability concent was tin- alignment of the 
power button, mute button, volume knob, audio jacks, and 
LEDs located on the front of the product. In the chassis, the 
main tray engages alignment pins, which serve to lock the 
tray to the chassis during vibration and shock. Because of 
the tolerance stackup front the front panel through the can. 
tray, backplane, system board, and all the connectors and 
buttons, we were concerned that the cosmetics at the front 
would be unacceptable. To improve alignment, we mounted 
these connectors on a long, thin section of the printed cir- 
cuit board that would flex and be supported by a metal 
brace so that the front section could move relative to the 
rest of the board. We added aligning forks to the front panel 
to position just that section of hoard. With this method, we 
were able to locate these connectors accurately. 

Manufacturing also assisted in improving the design through 
their participation in design reviews. One suggestion lead us 
to abandon the captive rear-panel fasteners that we had 
been planning to use. If the captive screws are not propci 1> 
aligned, they can be cross-threaded and stripped, or the 
captive nut on the chassis may be damaged. Consequently, 
the w hole tray would need to be replaced just for a simple 
mil or screw. Instead, we designed custom thumbscrews 
w ith an unthreaded nose section to align the screw before 
the threads engage. This minimizes cross-i Invading. To save 
labor costs we also used a coarse thread to reduce the num- 
ber of rotations necessary to remove and install them. 

Another goal was to reduce the number of screw types. We 
I t ied to standardize on a single screw used in our earlier 
0pfi0|1 boards because ibis was Ihe one screw that we could 
HOI change. We used it to attach Ihe power supply and the 
disk bay. To reduce screw count, the main fans and speaker 
snap in place. The backplane slides in place with keyhole 
Standoffs and forks in the main tray. When Ihe power supply 
is installed, two pins from the power supply trap Ihe back- 
plane in place. The power supply is supported w ith two 
screws and the two pins that are routed through the back- 
plane into the backplane support. Tin- power supply has 
floating connectors so thai stresses from vibration ;uul shock 
are not transmitted between tht backplane and the power 
supply via the connectors 

One of the primary objectives was upgradabiliiy I'pgrades 
can be easily accomplished by a simple swap of Ihe system 
board. Since everything is on one board, there are no issues 
with incompatibilities between dilTcrcnl Versions of Ihe I/O 
and ( PI ' The small I/< > bulkhead stays with the board so the 
main tray assembly need not change. Sufficient extra height 
remains where the ( PI ' and memory are located so thai 
future high-power CPUs hav e room for larger heal sinks 
of even small daughter boards if more hoard real eslale is 
needed. 
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Processor and System Verification 

The verification ••Hon for the PA 73O0LC and Ihe B-class ;ui<I 
('-class produc ts was also a joint effort. ShaiOQ tests were 
Conducted simultaneously on both the H-class and ('-class 
workstations. 

A shmoo test is designed to verify the product under volt- 
age, temperature, and frequency extremes. Its intention is to 
electrically stress the system under test to within and be- 
yond its operating limits. This process is part of our electri- 
cal characterization of the processor and system. A shmoo 
test is an important pari of our product development cycle 
By pushing the system to its electrical extremes, we hope to 
lineal any design weaknesses that could affect Ihe operation 
and performance of Ihe system under extreme operating 
conditions. Il often uncovers weaknesses in both chip and 
board designs. These might include signal cross talk, chip- 
drive capability, slow-speed paths at high temperatures, or 
board-level clocking problems. 

To achieve superior product quality, both processor and 
system shmoo lesls were performed on H-class systems. The 
processor shmoo lest focused on the core processor, caches, 
memory, and GSC bus. The system shmoo lesl emphasized 
peripherals and I/O, including the expansion I/O on Ihe GSC, 
EISA, and PCI buses. 

Since Ihe PA 7300LC was designed to work in both B-class 
and ('-class systems, it was tested in both systems. Proces- 
sor characterization was performed in the ( '-class systems 
by Ihe PA 7300LC design team. Simultaneously, Ihe B-class 
system design team completed the processor shmoos in a 
B-class system. Both the B-class and ('-class system design 
teams completed system shmoos with Ihe PA 7300LC in 
their respective environments. 

The parallel verificalions of the PA 7300LC in the B-class 
and ('-class systems complemented each other, providing 
Opportunities for leveraging and making the debug process 
go smoother. One of the issues discovered during Ihe pro- 
cessor shmoo test was the limited operating frequency of 
the GSC bus. This was caused by the length and load on the 
bus and a threshold problem on the PA 7300LG. The com- 
bined efforts of Ihe PA 7300LO processor and B-class system 

design teams extended the operating frequency of the gsc 

bus in our systems and provided the desired performance. 
The PA 7300LC design team corrected Ihe threshold problem 
and Ihe B-class team shortened Ihe GSC bus, which slightly 
changed its characl eristic impedance and helped to alleviate 
Ihe problem. 

Processor-level electrical verification has three main goals: 
uncover electrical (nonfunctional! bugs in Ihe system, find 
Critical speed paths that limit the maximum frequency of the 
processor, and provide correlation belween Ihe IC tester 
frequency and the eventual system frei|uency. The third goal 
had the biggest impact on costs. As development progressed, 
il became obvious to the PA 7300LC and B-class teams thai 



Ihe frequency mix ( 132 MHz lo Hill MHz) between the K 
tester and the system v\as not meeting marketing require- 
ments. The correlation effort between Ihe teams uncovered 
ways to enhance Ihe system electrical and (hernial environ- 
ments to bring Ihe yield mix and market demand together 
The close cooperation between Ihe Iwo (earns enabled 
Ihe quick identification Of a solution lo Ihe problem. We 
made alterations to the system's thermal cooling environ- 
ment, allowing us lo run Ihe PA 7300LG al a higher fre- 
quency, something we could not do in Ihe original cooling 
environment 

Over Ihe years, many efforts have been made lo address and 
improve ihe shmoo test process al both Ihe processor 
and system level. While processor shmoo testing reveals 
many system level problems, its primary focus is still Ihe 
processor, cache, and memory subsystems, rather than the 
I/O subsystems. As I/O bus speed and peripheral interlace IC 
complexity has increased, il has become more important to 
address the I/O subsystems in shmoo testing. The PA 7:((H)LC 
was designed to make complete system shmoos more prac- 
tical for this reason. The clock circuitry for tin- PA 7800LC 
was designed lo permit overriding the nominal clock Fre- 
quency while maintaining the correct synchronous relation 
ship between the processor anil I/O clocks. This allowed us 
to vary the frequency of operation more easily over a linger 
range of operation than in past products. 

< Ine of the challenges for System shmoo testing in B-class 
systems was Ihe range of new system components that had 
lo function correctly together during testing; As with the 
processor shmoo. system testing attempt ed lo stress Ihe 
electrical design of Ihe new components by operating them 
under extremes of temperature, voltage, and frequency. In 
addition lo Ihe core I/O components, various expansion I/O 
cards were selected to verify complete system functionality. 

The extensive system shmoo testing of the B-class system 
led lo Ihe optimization of sev eral circuits and resulted in a 
higher-performing, more robust system. We have come lo 
believe thai shmoo tests are an indispensable pari of our 
product development. Besides helping to catch potential 
problems before introduction, shmoo lesls also make posl- 
product support and maintenance easier. 

Conclusion 

Cooperative efforts between many functional areas such as 
manufacturing, service and support, marketing, firmware 
development, and the PA 73Q0LC chip development team 
together with the electrical and mechanical system designers 
have produced Ihe B-class Workstations. The closely coupled 
system design approach has yielded a workstation thai pro- 
vides significanl value to our customers. 
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Testing Safety-Critical Software 



Testing safety-critical software differs from conventional testing in that 
the test design approach must consider the defined and implied safety of 
the software at a level as high as the functionality to be tested, and the 
test software has to be developed and validated using the same quality 
assurance processes as the software itself 

by Evangelos Nikolaropoulos 



Test technology is crucial fur successful produd develop- 
ment Inappropriate or late tests, underestimated testing 
effort, or wrong test technology choices have often led 
projects to crisis and frustration. This software Crisis results 
from neglecting the imbalance between constructive software 
engineering and analytic quality assurance. In this article we 
explain the testing concepts, the testing techniques, and tin- 
iest technology approach applied to the patient monitors of 
the IIP ( Intuit are family. 

Patient monitors are electronic medical devices for observ- 
ing Critically ill patients by monitoring their physiological 
parameters (E('G. heart rate, blood pressure, respiratory 
gases, oxygen saturation, and so on) in real lime. A monitor 
can alert medical personnel when a physiological value 
exceeds preset limits and can report the patient's status on 
a variety of external devices such as recorders, printers, and 
computers, or simply send the data to a network. The moni- 
tor maintains a database of the physiological values to show 
the trends Of the patient's status and enable a variety of 
calculations of drug dosage or ventilation and hemodynamic 
parameters. 

Patient Monitors are used in hospitals in operating rooms, 
emergency rooms, and intensive care units and can he con- 
figured for every patient category (adult, pediatric, or neo- 
nate), Veiy often the patient attached to a monitor is uncon- 
scious and is sustained by other medical devices such as 
ventilators, anesthesia machines, fluid and drug pumps, and 
so on. These life-sustaining devices are interfaced with (In- 
patient monitor bill not controlled from it. 

Safely anil reliability requirements for medical devices are 
Sel very high by indusl ry and regulatory authorities. There is 
a variety of international and national standards setting the 
rules for the development, marketing, and use of medical 
devices. The legal requirements for electronic medical 
devices are. as far as these concern safety, comparable to 
(hose for nuclear plants and aircraft. 

In the past, the safely requirements covered mainly the hard- 
ware aspects of a device, such as electromagnetic compati- 
bility, radio interference, electronic parts failure, and so on. 
The concern for software safety, accentuated by some widciv 
known software failures leading to patient injury or death, is 
Increasing in the industry and the regulatory bodies. This 
concern is addressed in many new standards or directives 
such as the Medical Device Directive of the European Union 
or the I .S. l-'ood and I Irng Administration. These legal 



requirements go beyond a simple validation of the product; 
they require the manufacturer to provide all evidence of 
good engineering practices during development and valida- 
tion, as well the proof that all possible hazards from the use 
ofthe medical instrument were addressed, resolved, and 
validated during the development phases. 

The development of the IIP OmniCare family of patient moni- 
tors started in the mid-1980s. Concern for the testing of the 
complex safety-critical software to validate the patient mon- 
itors led to the definition of an appropriate testing process 
based on the ANSI/IEEE software engineering standards pub- 
lished in the same lime frame. Tin- testing process is an inte- 
gral part of our quality system and is continuously improved. 

The Testing Process 

During the specifications phase of a product, extended dis- 
cussions are held by the crossfunctional team (especially 
I he K<viD and software quality engineering teams) lo assess 
the testing needs. These discussions lead to a first estimation 
ofthe lest technology needed in all phases of the develop- 
ment (lest technology is understood as the sel of all lest 
environments and lest tools). In Ihe case of HI* patient mon- 
itors the discussion started as early as lilHK and continues 

with every new revision of tin- patient monitor family, refin- 
ing and in some cases redefining the lest technology. Thus. 
Ihe test environment with all its components and the tools 
for the functional, integration, system, anil localization lest- 
ing evolved OVW a period of seven years. Kig. I illuslrales 
the lesling process and Ihe use ofthe tools. 

The lest process starts with tin- lest plan, a document de- 
scribing Ihe scope, approach, resources, and schedule of Ihe 
intended lest activities. The test plan states Ihe needs for 
test technology (patient simulators, signal generators, test 

tools, etc.). This initiates suhproi esses to develop or buy the 

necessary tools. 

Test design is the documentation specifying the details of 
the lesi approach and identifying the associated tests. We 
follow three major Categories of tesl design for Ihe genera- 
tion of lest cases (one can consider them as the main direc- 
tions ofthe lesling approach): while box. black box, and 
risk and hazard analysis. 

The White hox lesi design inelhoil is for design lest, unit 
test, and integration tests. This lest design is totally logic 
driven and aims mainly al path and decision coverage Input 
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for 1 1 u» lest cases comes from external anil internal specifi- 
cations (design documents). The lesl design for algorithm 
validal ion ( proof of physiological measurement algorii hms I 
follows (he white box method, although sometimes this is 
very difficult, especially for purchased algorithms. 

The black box test design method is for functional and 
system test. This test design is data-driven and aims at the 
discovery- of functionality Haws by using exhaustive input 
tesling. Input for the test cases comes from the external 
specifications (as perceived by the customer) and the in- 
tended use in a medical emironment. 

Uisk and hazard analysis is actually a gray box method thai 
tries to identify the possible hazards from the intended and 
unintended use of the product that may be potential sources 
of harm for the patient or the user, and to suggest safeguards 
to avoid such hazards. Consider, for instance, a noninvasive 
blood pressure measurement device thai may overpump. 
Hazard analysis is applied to both hardware (electronic and 
mechanical) and soil ware, which interoperate anil influence 
each other. The analysis of events and conditions leading to 
a potential hazard (lite method used is the fault tree, a cause- 
and-effecl graph) goes through all possible slates of the 
hardware and software. The risk level is estimated (the risk 
spectrum goes from catastrophic lo negligible) by combining 
the probability of occurrence and the impact on health. For 
all states with a risk level higher than negligible, appropriate 
safeguards are designed. The safeguards Can be hard or soli 
(or in most cases, a combination of both). The test cases 
derived from a hazard analysis aim to test the effect iveness 
of the safeguards or to prove that a hazardous event cannot 
occur. 

Test cases consist of descriptions of actions and situations, 
input data, and the expected output from the object under 
lest according to its specifications. 



Fig. l. Tin' software irsiing pro- 
cess for HPOmniC&re patient 
monitors. 

Tesi procedures are I he detailed instructions for the setup, 
execution, and evaluation of results for one or more lest 
cases. Inputs lor their development are the lesl cases 
(which are always environment independent ) and the lesl 
environment as defined and designed in the previous phases. 
One can compare the generation and testing of the lest pro- 
cedures to the implementation phase of code development 

Tesling or test execution eonsisis of operating a system or 
component under specified conditions and recording the 
results. The notion of testing is actually much broader and 
can start very early in the product development life cycle 
with specification inspections, design reviews, and so on. 
For this paper we limit the notion of testing to the testing 
of code. 

Tesl evaluation is the reporting of the contents and results 
of testing and incidents that occurred during testing that 
need Further investigation and debugging (defect tracking). 

While test design and the deriv ation of test procedures are 
done only once (with some feedback and rework from the 
testing in early phases, which is also a test of the test ), test- 
ing and test evaluation are repeatable steps usually carried 
out several times until the software reaches its release 
criteria. 

Various steps of the tesling process also produce tesl docu- 
mentation, w hich describes all the plans for and results of 
tesling. Test or validation results are very important for doc- 
uiucnling the quality of medical products and are required by- 
regulatory authorities all over the world before the producl 
can be marketed. 

The regression lest package is a collection of test procedures 
thai can be used for selective retesting of a system or compo- 
nent lo verify thai modifications have not caused unintended 
effects and that the system or component still complies with 
its specified product and legal requirements. 
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From the Test Plan to the Testware 

Ten or fifteen years ago it was perhaps enough to give a 
medical instrument to some experts for clinical trials, 
algorithm validation, and test. The instrument had a simple 
display, information placement was not configurable, and 
the human interface was restricted to a few buttons and 
knobs. All the attention was on the technical realization of 
the medical solution (such as ECG monitoring), and software, 
mainly written by the electrical engineers who designed the 
hardware, was limited to a few hundred statements of low- 
level languages. 

Today the medical instruments are highly sophisticated open- 
architecture systems, with many hundreds of thousands lines 
of code. They are equipped with complex interfaces to other 
instruments and the world ( imagine monitoring a patient over 
t he Internet — a nightmare and a challenge at the same time ). 
They are networked and can be remolely operated. This 
complexity and connectivity requires totally new testing 
approaches, which in man) cases, are not feasible without 
the appropriate tooling, that is. the Icslirmr. 

Discussion Of the test plan Starts relatively early in the prod- 
uct life cycle and is an exit criterion for the specifications 
phase. ( die of the major tasks of the testing approach is Lite 
assessment of the testing technology needed, lite term tech- 
nology is used here in its narrow meaning of process plus 
hardware and software tools. 

The testing technology is refined in I he next phases (design 
and Implementation) and grows and matures as the product 
under development takes shape. On the other hand, the test- 
ing tools must he in place before the product meets its im- 
plementatiotl criteria. This means that they should be imple- 
mented and validated before the product (or subproducl ) is 
submitted for validation. This requirement illustrates why 
the lest technology discussion should starl very early in (he 
product life cycle, and why the leslware has a "phase shifl 

to the left" with respect i" the product validation phase (gee 
Fig. 2). 

Test Tool (Testware) Development 

Tesl ware development follows the same product life cycle 
as the product under development. The phases are: 
Requirements and Definition Phase. The test needs are 
explained according to the test plan and the high-level test 



design. Alternatives are discussed and one is selected by the 
software quality learn. 

• Specifications Phase. The tool is described in as much de- 
tail as possible to enable the testers to start work w ith their 
test cases and test procedures as if the tool were already 
available. These specifications are reviewed (or formally 
inserted) by the product development and test teams, 
approved, and put under revision control. 

• Design and Implementation Phase. Emphasis is on the rapid 
development of engineering prototypes of the tools, which 
again an- the object of review. These prototypes are used by 
the test team for low-level lest design and first lest trials. 

• Validation Phase. The test tool is validated against its speci- 
fications. The mosl up-lo-date revision of the patient moni- 
tor under development is used as a test bed for the tool's 
validation. Notice the inversion of roles: the product is used 
to tesl the lesi tool! Our experience shows that this is the 
mosl fruitful period for finding hugs in both the tool and 
the product. A regression package for future changes is 
created. First hardware construction is started if hardware 
is involved. 

• Production Phase. The tppl is officially released, hardware 
is produced (or bought ), and Ihe tool is used in the tesl envi- 
ronment. After some period of lime, when Ihe tool's maturity 
for the tesl purposes has been assessed, Ihe tool is made 
public for use by other lest departments, by marketing for 
demos, by support, and so on 

Fig. '1 demonstrates the main difficulty of leslware develop- 
ment: the lesi tool specifications can be created after Ihe 
product specifications, but from this point on, all of the test- 
ware development phases should he earlier than Ihe product 
development phases if the product Is to be validated in a 
timely manner. 

Besides the Shift of the development phases, there is also 
Ihe leslware dilemma: as the progress of Ihe product's de- 
sign and the test design leads to new perceptions of how 
Ihe product can be tested, new opportunities or limitations 
appear thai were previously unknown, and influence Ihe 
scope of the testware. The resulting Changes in the leslware 
must then he made very quickly, more quickly lhan the 
changes in Ihe product < >nly the application of good hard- 
ware and software engineering processes (the tester is also 
a developer) can avoid having the wrong tesl tool for Ihe 
product 
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Another Approach to Testing: 
Inspections 

Inspections have proved to be a very powerful white box testing method 
Inspections are performed in all phases of the product life cycle starling 
with the product specifications (External specifications inspections are 
an exit criterion for the specifications phase). The inspection goals are 
100% for specifications (exit criterion), 50% for product design (for the 
most critical parts according to the results of the risk and hazard analysis 
and the product architecture), and 20% (or code (most critical parts). 

Although there is no defined goal for test design inspections, in practice 
about 75% of test design (high-level and test procedures just before 
automation) is inspected formally. Code inspections are performed by 
two or three engineers Specifications inspection teams are larger— the 
crossfunctional team as well as R&D experts are participants. The in- 
spection process has the following steps 

• Kickoff Distribution of inspection material and logging meeting logistics 

• Logging All items are logged. Short explanatory discussions are allowed 
(less than three or lour minutes). A moderator and a scribe are always 
assigned to facilitate the logging meeting However, there is no chief 
moderator assigned in our laboratory. 

• Follow-up and Rework. This step ensures that all of the fixes and clarifi- 
cations that were identified as necessary in the logging meeting are 
done and all items are addressed. In an informal message to all partici- 
pants of the logging meeting, all fixes are explained and the reasons for 
the unfixed issues are discussed. 

Alter extensive training and with massive management support, litis 
inspection process works very well and is a fixed part of the product life 
cycle. 



AutoTest 

The lest technology assessment for the patient monitors led 
ns to the development of a number of tools that could not be 
found on the market. This make instead of buy decision was 
based mainly upon the nature of the patient monitors, which 
have many CPUs, proprietary operating systems and net- 
works, proprietary human interfaces, true real-lime behav- 
ior, a lot of firmware, and a low-level, close-to-i he-machine 
programming style. Testing should not be allowed to Influ- 
ence the internal timing of the product, and invasive testing 
(having the tests and the objects under test on the same 
computer) had to be avoided. 

The first tool developed was AutoTest. 1 which addressed 
the need for a tool able to (1) simulate the patient's situation 
by (hiving a number of programmable patient simulators, 
( j!) simulate user interactions by driving a programmable 
keypushcr. and (M| log the reaction of the instrument under 
test (alarms, physiological values, waves, recordings, etc. | 
by taking, on demand, snapshots of the information to send 
to the medical network in a Structured manner. 

AutoTest was further developed to accept more simulators of 
various parameters and external non-HP devices such as Ven- 
tilators and special measurement devices attached to the HP 
patient monitor. AutoTest now can access all information 
traveling in the internal bus of the instrument (over a serial 
port with the medical computer interface) or additional infor- 
mation sent to external applications (sec article, page 103). 



AutoTest is now able to: 

• Read a test procedure and interpret the instructions to 
special electronic devices or PCs simulating physiological 
signals 

Allow user input for up to Inpatient monitors simultaneously 
over different keypushcrs ( 12 is the maximum number of 
RS-232 interfaces in a PC) 

• Allow user input with context-sensitive keypushing (first 
search for the existence and position of an item in a menu 
selection and then activate it) 

• Maintain defined delays and lime dependencies between 
various actions and simulate power failure conditions 

• Read the reaction of the device under lest (alarms, physio- 
logical values and waves with all their attributes, window 
contents, data packages sent to the network, overall stains 
of I he device, etc.) 

• Drive from one PC simultaneously the lesls of up to four 
patient monitors that interact with each other an<l exchange 
measurement modules physically (over a switch box) 

• Kxecule batch files wilh any combination of test procedures 

• Write to protocol files all actions (user), simulator commands 
for physiological signals (patient ). and results (device under 
lest ) wilh the appropriate time stamps ( with one-second 
resolution). 

AutoCheck 

The success of AutoTest and the huge amount of data pro- 
duced as a result of testing quickly led to the demand for an 
automated evaluation tOpL The first thoughts and desires 
were for an expert system that ( 1 ) would represent explicitly 
the specifications of the instrument under lest and the rules 
of the lest evaluation, and (2) would have an adaptive knowl- 
edge base. This solution was abandoned early lor a more 
versatile procedural solution named AutoCheck (see article, 
page 103). By using existing compiler-building knowledge 
we built a tool that: 

• Enables the definition of the expected results of a lest case 
in a formal manner using a high-level language. These for- 
malized expected results are part of the test procedure and 
document at the same time the pass-fail criteria. 

• Reads the Output of AutoTest containing the expected and 
actual results of a test. 

• Compares the expected with the actual results. 

• Classifies and reports the differences according to given 
Criteria and coudilions in error files similar to Compiler 

error files. 

AutoCheck has created totally new and remarkable possibi- 
lities for the evaluation of lests. Huge amounts of test data 
in protocol files (as much as 100 megabytes per day) can be 
evaluated in minutes where previously many engineering 
hours were spent. The danger of overlooking something in 
lengthy protocols full of numbers and messages is eliminated. 
AutoCheck provides a much more productive approach for 
regression and local language tests. For local language tests, 
it even enables automatic translation of the formalized pass- 
fail criteria dining run time before comparison with the 
localized test results (see article, page 109). 
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ATP 

The next step was the development of a sort of test generator 
that would: 

Be able to write complex test procedures by keeping the 
lest design at the highest possible level of abstraction 
Enable greater test coverage by being able to alter the entry 
conditions for each test case 

Klimitiatc the debugging effort for new test procedures by- 
using a library of validated test primitives and functions 
Take account of the particularities and configurations of 
the monitors under test by automatic selection of the test 
primitives for each configuration 

Produce (at the same lime as the test setup and the entry 
conditions) the necessary instructions for automated 
evaluation by AutoChcck. 

The resulting tool is called ATP ( Automated Test Processor, 
see article, page 95), Like AutoChcck. ATP was developed 
by using compiler-building technology. 

Results 

Good test design can produce good and reliable manual 
tests. The industry has had very good experience with sound 
manual lests in the hands of experienced testers. However, 
there is no chance for manual testing in certain areas of 
functionality such as interprocess communication, network 
communication, ("PI* load, system load, and soon, which 
can only be tested with the help of tools. Our process now 
leaves the most ledious, repetitive, and resource-intensive 
pails of the testing process for the automated test ware: 
ATP for the generation of test procedures in a variety of 
configurations and settings based on a high-level tesi design 
AuloTesi for lest execution, 24 hours a day, 7 days a week 
with unlimited combinations of tests and repetitions 
AuioChcck for the automated evaluation of huge amounts 
Of tesi protocol dala. 

< )ne Of the most interesting facets is the ability of these tools 
to self-document I heir output with comments, lime stamps, 
anil so on, Their output can be used without any filtering to 
document the tesi generation with pass-fail criteria, test 
execution wilh all execution delails (test log), and test eval- 
uation wilh a classification ol the discrepancies i warnings, 
errors, range violations, validity errors, etc.). 



Automated test ware provides us with reliable, efficient, and 
economical testing of different configurations or different 
localized v ersions of a product using the .saint- test environ- 
ment and the smif test procedures. By following the two 
directions of i I ) automated testware for functional, system, 
and regression tests (for better test coverage I. and (2) inspec- 
tion of all design, test design, and critical code (as identified 
by the hazard analysis), we have achieved some remarkable 
results, as shown in Fig. & 

Through the years the patient monitor software has become 
ttV m and more complex as new measurements and inter 
faces were added to meet increased customer needs for 
better and more efficient healthcare. Although the software 
size has grown by a factor of three in six years (and v\ ith it 
the testing needs), the testing effort, expressed as the num- 
ber Of test cycles times the test cycle duration, has dropped 
dramatically. The number of test cycles has dropped <>r re- 
mained stable from release to release. 

The predictability of the release date, or the length of the 
validation phase, has improved significantly. There has been 
no Slippage of the shipment release date with the last four 
releases. 

The ratio of automated to manual tests is constantly improv- 
ing. A single exception confirms the rule: for one revision, 
lack of automated testware for the new functionality — a 
module to transfer a patient database from one monitor to 
another — forced us to do all tests for this function manually. 

The lesl coverage and the coverage of the regression lesling 
has improved over the years even though the percentage of 
regression testing in the total lesling effort has constantly 
increased. 

Conclusion 

Software quality does not Start and surely does not end with 
lesling, Because lesling. as the term is used in this article, is 
applied >o the final products of a development phase, defect 
discovery through testing always happens tOO late in each 
phase of product development All the experience gained 
show s thai defect prevention activities I by applying Che ap- 
propriate constructive software engineering methods during 
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product development in all phases) is more productive than 

any analytic quality assurance ai Hie end of the (feveiopMent 
process. 

Nevertheless, testing is the ultimate sentinel or a quality 
assurance syslcni before a product reaches I lie next phase 
in ils life cycle. Nothing can replace good, effective testing 
in I he validation phase before the product leaves R&D to go 
lo manufacturing (and to our customers). Even if this is the 
only and unique lest cycle in ihis phase (if the defect preven- 
tion activities produced an error-free product, which is still 
a vision), it has lo be prepared very carefully anil be well doc- 
umented. This is especially true for safety-critical software, 
for which, in addition to functionality, the effectiveness of 



all safeguards under all possible failure conditions has to he 
tested, 

In ihis effort, automated testware is crucial to ensuring reli- 
ability (the lest ware is correct, validated, and does not pro- 
duce false negative results), efficiency (no test reruns be- 
cause of testware problems), and economy (optimization of 
resources, especially human resources). 
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A High-Level Programming Language 
for Testing Complex Safety-Critical 
Systems 

Dealing with an enormous amount of data is characteristic of validating 
complex and safety-critical software systems. ATP. a high-level 
programming language, supports the validation process In a patient 
monitor test environment it has shown its usefulness and power by 
enabling a dramatic increase in productivity. Its universal character allows 
it to migrate validation scenarios to different products based on other 
architectural paradigms 

by Andreas Pirrung 



This article concentrates (iii the specific problem of trans- 
fomiing a Icsi design into concrete automatic test proce- 
dures. For a systematic overview and context ihe reader 
is referred to the article on page 8f.i. As described in thai 
article, the lesl design identifies and documents the test set 
for a given product. It is derived from external and internal 
specifications, software quality engineer expertise, and risk 
and hazard analysis results. A test design is normality infor- 
mal and describes test cases and test data on a high, abstract 

level, independent ofthe tesl environment, On the other 

hand, an automatic lest procedure hag to deal with all Ihe 
details of the test environment and reflects the abstraction 

capabilities of the existing tools. 



hi our software quality engineering department the automatic 
test environment is based upon two major tools: AutoTest' 
and Autot heck. The first is a test execution tool and the 
latter is responsible for lesl evaluation tsee article, page ln.'ti. 
AutoTest is very close tO the deuces it controls and requires 
detailed commands on a low abstraction level. AutoCheck 
has to cope with the detailed low-level information produced 
by AutoTest and therefore also requires input on a detailed, 
lOW abstraction level (see Fig. I ). The strengths of the low- 
level interfaces are l heir flexibility and adaptability to various 

different test situations. 
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Fig. I. Patient monitor tost pro 

cess. 
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There are some difficulties with I lie process shown in Fig. 1. 
The lest engineer spends a lol of lime transforming lesl de- 
signs into automatic lesl procedures. There is a large gap in 
abstraction level between the test design and the lest proce- 
dure. The detail level is low in the test design, but very high 
in the test procedure. It is an error-prone, lime-consuming 
task to bridge this gap manually. Because resources arc 
always restricted, the software quality engineer has less 
time for a more intensive test design. 

Because the lesl procedures have a high explicit redundancy, 
it is difficult to maintain ami evolve lesl procedures. The 
explicit redundancy is high because AuloTest and Auto- 
Check do not support daia and functional abstraction, nor 
do I hey offer control How elements. A piece of code may 
exist in many copies scattered over the lesl procedures. If 
Ihe lesl requires a change in the code pattern, for instance 
because of changes in the timing behavior of Ihe system 
under test ( in our case a patient monitor), the lesl engineer 
has Ifi update numerous copies of this code pattern. The risk 
of forgetting one pattern or introducing an error in a lesl 
procedure increases Willi Ihe number of update sleps. Il is 
very resource-consuming lo adapl lesl procedures to a 
change in syslem behavior. 

The lest procedure describes a static test scenario. There- 
fore, the lesl engineer has to document the lest setup com- 
pletely. Every parameter that influences the test environ- 

inenl and consequently the lesl execution musl be carefully 
controlled before stalling Ihe automatic lest. Our lesl envi- 
ronment consists of so many simulators, forcing devices, and 
sensing devices lhal sometimes lesls need to be repeated 
because Ihe initial conditions are wrong. The problem is thai 
Ihe automatic test procedure describes only one specific 
lesl situation. Il is not possible lo use parameters for the 
tesl procedure and to feed in the actual sum parameters at 
the beginning of Hie tesl execution to gel more general and 
robust lesl procedures. Even a slight change in Ihe start con- 
dition may require an adapted or nearly new lesl procedure. 

The lesl coverage is limited because the lesl data is coded 
within the test procedure. The repetition of a tesl case with 
other test data requires a modified duplicate of ihe lest pro- 
cedure. Again, il would help if a lest case were able lo profit 
from data abstraction and parameters, enabling Ihe lest 
engineer lo formulate more general lesl procedures. 

AutoTcst and AuioCheek do not support the statistical Struc- 
tural testing approach. Il is therefore not possible to select 
test data randomly (see "Structural Testing, Random Testing, 
and Siatistical Structural Testing" on page 97). 

The following seel ion illustrates the above problems by pre- 
senting a practical example to demonstrate the transforma- 
tion of a high-level lest design to an automatic test procedure. 

A Practical Example 

Patient monitors are electronic medical devices used lo 
monitor physiological parameters of critically ill patients in 
intensive care units or operating rooms. They alert the medi- 
cal staff when physiological parameters exceed preconfig- 
ured limits. In this example, we will concentrate on a well- 
known physiological parameter, (he heart rate. The nurses 



High Alarm Domain 




Time 



Fig. 2. Heart rate (HR) alarm teal principle 

and doctors want to get an immediate alarm when the heart 
rale falls below a given lower limit or exceeds a given higher 
limit, A malfunction of ihe monitor may result in Ihe death 
of a patient, so this functionality is safely-critical and must 
be validated very carefully by the vendor of the patient mon- 
itor. The example illustrates a lesl design for hearl rale 
alarm icsiing and ihe transformation process to ihe appro- 
priate automatic tesi procedure. 

Fig. 2 shows Ihe upper and lower alarm limits for Ihe heart 
rate parameter. The data space can be divided into three 
subdomains (equivalence classes I: 

• The normal hearl rale domain — Ihe interval between the 
alarm limits. The monitor should not alarm for data points 

taken from Ibis area. 

• The lower alarm domain. All data here produces a low limit 
alarm. 

• The upper alarm domain. All data here produces a high limit 
alarm. 

A classic method of testing the alarm behavior is to select 
representatives from each of Ihe Ihree areas and check (hat 
Ihe monitor reacts as expected. Fig. 2 shows the selected 
dala points and their order in lime. 

This graphical representation of the hearl rale (HR) alarm 
lesl leads lo Ihe following lesl design: 

• Test I use 1: 

Aetion(s): Configure HR alarm limits to 50/80. Apply 
signal HR 15. 

Expected: Low limit alarm With lext "**HK 45 < 50". 

• Test Case 2: 

Action! s): Apply I IK signal 49. 

Expected: Low limit alarm remains With text " : IIR 49 
< 50". 

• Test Case 3: 

Action! s): Apply IIR signal 50. 
Expected: Low limit alarm disappears. 

• Test Case 4: 

Aclion(s): Select 5 different HR values between 50 and SO 
and apply them. 

Expei ted: No HR alarm for each of Ihe selected values. 

These few test cases are enough to demonstrate Ihe prin- 
ciples of lesl design. The appropriate automatic lesl proce- 
dure description for test case 1 on Ihe AuloTest and Auio- 
Check level I hen looks like: 
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// Test Case 1: 

// Adjust alarm limits to 50-80. 

// 

merlin param 
merlin "HR" 
merlin f2 
merlin f7 
merlin f3 -n48 
merlin f 6 -al2 
merlin f4 -n34 
merlin f5 -n5 
// 

// Apply HR signal 45 (normal sinus beat). 
// 

siml NSB45 

// 

// Delay 10 s : after that time the alarm must 
// be announced, 
wait 10 
// 

// Check if low limit alarm "•* HR 45 < 50" iB 
// present. 

* Verify begin 

A Alarm "HR" < al_min; 
// Low alarm active. 

* sound is c_yellow; 

// Limit alarm sound audible. 

* Verify end 
mecif tune HR 10 

// Tune 10 the HR numeric of the patient 
// monitor. 



This example demonstrates ilu- difference in abstraction 
between a test design (high abstraction) and ail automatic 
lest procedure (lovv abstraction) ami gives ;ui impression of 
the difficulties noted above. An automatic test ill 'script ion 
language designed to alleviate these difficulties should oiler 
abstraction capabilities to hide details and to compose com- 
plex functions IVom simpler functions. Like every high-level 
programming language, it should bridge the abstraction gap 
automatically. In tin- following section a solution is presented 
that meets these needs. 

The VI'I' Programming Language 

< H'tcn specific problems need baSiC iniirissnis like AuloTcsl 
or AutoCheck Hi perform some operation such as pushing 
keys, simulating patient signals, simulating powerful! condi- 
tions, and so on. A straightforward solution might extend 
the command interfaces of AutoTest and AutoCheck to sup- 
port data and funciional abstraction, provide control flow 
elements like conditions and loops, and allow further proba- 
bilistic data generation. This would probably eliminate the 
difficulties mentioned above. However, redundant effort 
would have to be spent implementing an abstract command 
interface again and again. 

Our solution is ATI' (Automated Test Procedure), a high-level 
programming language that offers the abstract ion facilities 
and makes it possible lo integrate basic processors smoothly. 
ATI' allows the integration of many different basic proces- 
sors, so the coordination of the basic processors is much 
easier llian with separate control. 

The following is a typical ATI* tontine representing the auto- 
matic Irsi procedure for the heart rate alarm lest: 



Structural Testing. Random Testing, 
and Statistical Structural Testing 

Random testing is one of the more common test strategies It does not 
assume any knowledge of the system under test, its specifications, or 
its internal design This technique is insufficient for validating complex, 
safety-critical or mission-critical software 

The structural testing approach systematically derives the test proce- 
dures from the external and internal specifications Therefore, the term 
test design best describes the mental activity behind this method The 
structural testing approacn divides the input data space into subdomams 
The criteria for this partitioning are given by the external specification of 
the system Each subdomain is an equivalence class which is tested by 
choosing some representatives But what if the subdomain is heteroge- 
neous, has unknown side effects, or includes errors if executed in a par- 
ticular order 7 |A heterogeneous subdomain includes both good and bad 
data points Good means that the system works as specified, whereas 
bad data leads to system failure. For example, in the heart rate alarm 
test described in the accompanying article, the high alarm limit domain 
may contain data points that, when applied to the patient monitor, pro- 
duce no high limit alarm Other data points may behave as expected) 
Unfortunately, the subdomams are seldom homogenous or disjoint 

Waeselynck & Thevenot-Fosse' showed that a statistical component has 
to be included to provide a sufficient test data set for a subdomain. This 
approach is known as statistical structural testing Our experience has 
shown thai this strategy leads to the best results. 

Reference 

t H Waeselynck and P Thevenot-Fusse, "An Experimentation with Statistical 
Testing," Proceedings otthe 2nd European International Conference on Software 
Testing Analysis S Review, 1994 



DEFINE AlarmTest ( IN PatientSize CHECK IN { 
"ADULT", "PEDIATRIC", "NEONATE" 
) . 

IN Category CHECK IN {"OR", "ICU") 
) 

DESCRIPTION 
PURPOSE : 

This routine demonstrates some of the 
ATP features. It is an automatic test 
procedure testing the HR alarm 
capabilities. 

SIGNATURE : 

AlarmTest ( <PatientSize> , <Category> ) 
END DESCRIPTION 

LOCAL HRValue,/" selected HR Value •/ 

AL, /* HR low alarm limit */ 

AH, /* HR high alarm limit */ 

walk /* repetition counter */ 

/* Initialize the Repository */ 

LINK Repository <- "$PatientMonitorRepository " 
Repository : Init (PatientSize, Category) 

/* Declare the use of a function 
repository and initialize the 
repository link. This gives 
context-specific access to all 
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available functions for the 
given PatientSize and Category. 

*/ 



SIGNATURE : 

RandomAlarmTest ( <repetitions> ) 

END DESCRIPTION 



/* Test Case 1 

Action(s): Configure HR Alarm Limits to 50/90. 

Apply Signal HR 45. 
Expected: Low limit alarm with text 
"**HR 45<50". 

./ 

HR:SetAlarmLimits (50, 90) 
HR:SimulateValue (45) 

HR:CheckAlarm (10, "•• HR 45 < 50") 

/* Check for limit alarm after 

delay of 10 s. */ 

/* Test Case 2 

Action(s): Apply Signal HR 49 

Expected: Low limit alarm remains with text 
"**HR 49<50" (alarm string is 
updated without delay) . 
./ 

HR:SimulateValue (49) 
HR:CheckAlarm (0, "** HR 49 < 50") 



LOCAL AL, /* low alarm limit */ 

AH, /* high alarm limit */ 

walk, 
HRValue 

/* initialize the Repository */ 

LINK Repository <- "$PatientMonitorRepository" 
Repository: Init (CHOOSE ( { "ADULT" , "PEDIATRIC" , 

"NEONATE"} ) , 

CHOOSE ( { "OR" , "ICU" } ) 

) 

/* Declare the use of a function 
repository and initialize the 
repository link. Choose 
patient size and category 
randomly. This gives context - 
specific access to all avail- 
able functions for the given 
PatientSize and Category. */ 



/* Test Case 3 

Action(s): Apply Signal HR 50. 
Expected: Low alarm limit disappears. 



HR:SimulateValue (50) 
HR:CheckNoAlarm (5) 

/* No HR alarm present after 5s. */ 

/* Test Case 4 

Action(s): Select 5 different HR values 

between 50 and 80 and apply them. 
Expected: No HR Alarm for each of the 
selected values. 



walk <- 1 

/* Randomly choose some HR values 
in the range 50/80, i.e., no 
alarm condition exists and 
therefore no HR alarm must 
be visible and audible. */ 
WHILE walk <= 5 DO 

HRValue <- RANDOM (50, 80, 1) 

HR:SimulateValue (HRValue) 

HR:CheckNoAlarm (0) 

walk <- walk ♦ 1 

ENDWHILE 



/. ./ 

/* Randomly select valid HR 
alarm limits, then randomly 
select an HR value and check 
if the monitor reacts as 
expected. */ 

walk <- 1 

WHILE walk <= repetitions DO 

HR:RandomSelectAlarmLimits (AL, AH) 

/* randomly select valid alarm 
limits */ 
HR:SetAlarmLimits (AL, AH) 
HRValue <- RANDOM (20, 180, 1) 

/* select HR values between 20 
and 180 with step width 1. */ 
HR:SimulateValue (HRValue) 
IF HRValue < AL THEN 

HR:CheckAlarm (10, "** HR " + HRValue + "<" 
* AL) 

ELSIF HRValue > AH THEN 

HR:CheckAlarm (10, "** HR " + HRValue + ">" 
+ AH) 

ELSE 

HR:CheckNoAlarm (5) 
ENDIF 

walk <- walk + 1 
ENDWHILE 



END RandomAlarmTest 



END AlarmTest 

Ail automatic test proc edure for a random heart rate alarm 
test in ATI' might look like the following: 

DEFINE RandomAlarmTest ( IN repetitions ) 

DESCRIPTION 
PURPOSE : 

Random HR Alarm Test 



Even without familiarity with Hip syntax and semantics of 
the ATP language, it can lie recognized that the abstraction 
level is higher than with plain code for the hasic processors 
(in our case. AuioTesi and Aiito('heek). It is also worth 
noting that the automatic test procedures are not restricted 
to a specific patient monitor. They describe in a general and 
abstract way a heart rale alarm test for any patient monitor 
with a limit alarm concept. The differences between specific 
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patient monitors are in the basic processor interfaces ami in 
the primitive functions. 

The ATP Concept 

ATP consists of two major functional elements (see Pig. 3): 
a set of tools to maintain a function repository and an ATP 
language interpreter. 

The tools give the user adequate access to the function re- 
pository, which contains well-documented, well-tested ATP 
functions that can be reused This effectively reduces reuun- 
dancy and increases productivity (see the next section, 
"Working with the Function Repository"). The tools can be 
grouped into: 

File-oriented tools. Check functions into and out of I lie 
function repository, compare two versions of a function, 
etc 

Repository query functions. Obtain information about avail- 
able functions, about an interface of a function, etc. 
Repository administration functions. Administer and main- 
tain the structure of the repository. Allow archival and re- 
trieval of the repository. These functions are onlj accessible 
by the repository administrator. 

With these tools and a text editor, a programmer writes ATP 
functions by reusing existing functions from the repository. 
When the function repository is well-structured and offers 
reliable- functions on an adequate abstraction level, it is easy- 
even for an inexperienced programmer to write functions, 
as shown in the example above. 



Interpreter 

The core of the system is the ATP language interpreter. The 
interpreter requires an input (De and an output file. The 
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input file is an ATP function. In contrast to other common 
liigh-level languages. ATP requires one file per fiuiction. One 
advantage of this approach is that each ATP function is exe- 
cutable. There is no explicit syntactical distinction between 
a main routine and a subroutine. Any function can l>e the exe- 
cution starting point and can call any other function. There 
is no explicit function hierarchy. The hierarchical structure 
is provided independently by the repository structure. 

Another advantage is that, because each file contains only- 
one function, it is much easier to administer the functions in 
the function repository. To fulfill structural requirements the 
functions can be grouped by any criteria. The heart rale 
example above uses functions that all belong to the logical 
func tional group HR. 

The interpreter output data can be written into an output 
file. A powerful feature of the language is its ability to inte- 
grate, formai processors. A format processor is a prohlein- 
domain-specific process (a basic processor) that can be 
integrated Into the interpretation process, hi the patient 
monitor testing example. AutoTest mid AutoCheck are for- 
mat processors. The ATP language offers Syntactical ele- 
ments to establish a communications channel to a format 
processor so that within the ATP code the user can send 
any information to the processor. The format processor can 
send back information to the ATP interpreter, which then 
can be sent to another format processor or logged to the 
output file. The creation of this ATP adapter interface is an 
easy task, thanks to an API that enables a programmer to 
implement this communication interface w ith ATP. If an 
integrated format processor is general-purpose, it can be 
offered to all ATP programmers. A good example is KSH, a 
format processor that enables ATP programmers to inte- 
grate Korn shell commands within ATP code. The formal 
processor concepl and the abstraction facilities of the ATP 
language offer the programmer the means to model the 
problem domain in an adequate and flexible way. 

Working with the Function Repository 

The following short tour illustrates some of the tools that 
are available to handle function repositories. Suppose that a 
programmer wants to know which functions in the repository 
are available for dealing with heart rate operations. Typing: 

Liblndex 
Group: "HR" 

will, for example, produce the output: 

CheckAlarm 

CheckNoAlarm 

SimulateValue 

SetAlannLimits 

RandomSelectAlarmLimits 



Tb get information about the interface of the SetAlarmLimits 
function, the programmer can type: 

TellMe 
Group: "HR" 

Functions: "SetAlannLimits" 

and w ill get 
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.............. HR:SetAlarmLimits **«***♦♦**«**« 

* 

* purpose: configure Che Heart Rate lower and 

* upper alarm limits. 
* 

* signature: SetAlarmLimits ( <lower alarm limit>, 

* -ehigher alarm limit>) 



The hear! rate test pxaniplf 1 above uses this and oilier func- 
lions. Ai the beginning of the function a link statement 
declares ;ui access path lo a given function repository. Thetl 
the function repository is initialized. This initialization func- 
tion introduces all available functional groups given a spe- 
cific system context. At that point the programmer is able 
to call the functions, for example HR:SetAlarmLimits, without 
knowing implementation details or physical locations. It is 
possible to check out a function for enhancement or mainte- 
nance purposes. It is also possible to check a function into 
the repository so that all lest engineers can use the new 
function. 

Format Processors 

Fig. 4 presents the possibilities and the flexibility of formal 
processors. Kadi formal processor consists of two parts; ;i 
basic processor and an ATI' adapter. The basic processor is 
a proprietary part, that is, any executable code written by a 
programmer. Typically I he basic processors are on a low 
abstraction level. The ATP adapter is the interface lo ATP 
that allows data to be sent to ATP and receiv ed from it. This 
functionality is encapsulated and offered as an API. 

A formal processor can be used within ATP in the following 
way: 



FORMAT MyFormatProcessor <- "$MY_FP EXECUTABLE" 
/* Declare the use of a format processor. */ 



BEGIN [ MyFormatProcessor ] 



END [ MyFormatProcessor ] 
/* Use the format processor, 
ing information. */ 



sending and receiv- 



First. a specific syntactical construct introduced with the 
keyword FORMAT is used to declare the use of a format pro- 
cessor. It is then t he lask of ATP to control and to communi- 
cate with the formal processor; .All informal ion enclosed in 
the syntactical bracket BEGIN IMyFormatProcessor] and END 
(MyFormaiProcessor) represents a code template for the named 
format processor. ATP generates the actual code block from 
Ibis code template by substituting the actual parameter val- 
ues for the code template parameters. Then this code block 
is sen! lo the format processor for immediate execution. The 
fOrinal processor receives the code by calling API functions 
provided by the ATP adapter. The proprietary pari of the 
formal processor processes the received information. The 
formal processor can send back information to ATP. ATP 
receives this information and logs it to the output file or 
redirects it to another format processor. 

Remote Format Processor. If a programmer needs to integrate 
a format processor on another machine, for example on a 



PC running Windows" NT. this can be specified in the FOR- 
MAT declaration. No additional effort is required for the 
programmer to establish a remote formal processor. The 
adaptation for remote control is done by ATP automatically 
by adding a distribution adapter. 

Concatenated Format Processor. Another feature of the ATP 
language is the ability to concatenate existing formal 
processors. 



FORMAT X <- 

FORMAT Y <- 

FORMAT Z <- X I Y 

/* Concatenate format processors X and Y to Z. 
This is similar to UNIX pipes. */ 



BEGIN [ Z ] 



END [ Z ] 

The code block between BEGIN |Z| and END |Z] is first sent to 
formal processor X. The output of formal processor X is 
sent to format processor Y. For formal processor V it makes 
no difference where the information conies from, that is. the 
concatenation is mediated by ATP automatically. Format 
processor Y sends its output back to ATP. 

ATP in the Patient Monitor Test Environment 

The concept behind ATP eased its integration into the pa- 
lient monitor lest environment. The impetus to develop this 
concept came from our experience with the lesl environ- 
ment, as described at the beginning of this article. But the 
concept is more general. It is not resticied to the patient 
monitor test environment atp can be used to attack many 
different problems. 




Basic 
Processor 



Fig. 4. Formal processor functional blocks. 
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Fig. 5. Current ATP integration 
in the patient monitor test envi- 
ronment (phase I). 



The integration of such a tool into an existing environment 
is a challenging task. ATI', like every tool, requires some 
effort to build up the necessary infrastructure, to support 
the tool, and to learn the new language. A step-by-step, three- 
phase integration of ATP in ihe test process was planned. 

Phase I. Develop a palienl-monitor-relevant repository Struc- 
ture that is easy to use and maintain, In parallel, implement 
a sel of primitive functions lo fill the repository. Then test 
Ihe structure on new palient monitor functionality. In this 
phase ATP does not specify any formal processor execut- 
able, thai is, AutoTest and AutoCheck are not integrated as 
formal processors. ATP writes the code block immediately to 
Ihe out put file. The generated code is then processed in a 
postprocessing step. 

Phase II. Enhance AutoTesI and AutoCheck with ATP adapt- 
ers so I hill they can be integrated as format processors. 
Then (lie same functions used in phase I can be executed 
immediately without ihe postprocessing steps needed in 
phase I. The test tools are invoked by ATP. 

Phase III. Complete the function repository and migrate, step 
by step, the existing lest package to ATP functions. Then the 
testing package can be used again for new palient monitor 
products simply by replacing Ihe formal processors by new 
ones and by substituting some primitive functions. 

Currently the phase I integration is completed (see Fig. 5). 
A repository structure has been proposed and evaluated in 
some projects. In these projects test engineers use ATP to 
aulomale Ihe lesls. ATP generates AuloTesI and AutoCheck 
rode, which is I lien passed lo AuloTesI and AutoCheck for 
execution. For phase II integration, only the declaration of 
ihe AuloTesI and AutoCheck formal processors will change. 



They will then specify integratable AutoTest and AutoCheck 
format processors. This phase is currently in progress. 
Phase III has been started. 

ATP Integration in Phase I: An Kxantple 

The following ATP function illustrates how ATP generates 
AutoCheck code. This Function is the CheckAlarm function 
called in ihe heart rate alarm lest used in Ihe example 
presented earlier. 

DEFINE CheckAlarm ( IN AlarmDelay TYPE IN 
{ "REAL" } , 
IN AlannString TYPE IN 
{ "STRING" ) 

) 

DESCRIPTION 
PURPOSE : 

Check if alarm is present after a specified 
delay. 

SIGNATURE : 

CheckAlarm ( <AlarmDelay> , <AlarmString> ) 

END DESCRIPTION 

FORMAT AutoTest <- " " 

FORMAT AutoCheck <- " " 

/•At the moment AutoTest and AutoCheck 
are not really format processors. The 
declaration of AutoTest and AutoCheck 
does not specify any format processor 
executable. In this case ATP writes 
the code block immediatly to the 
output channel, i.e. ATP generates 
AutoTest /AutoCheck code. */ 
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LOCAL sound 

IF ».*•» == AlarmString [1, 3J THEN 

/* Is it a red alarm? */ 
sound <- "red" 
ELSE 

/* NO ==> yellow alarm */ 
sound <- "c_yellow" 
ENDIF 

/* Very critical alarms are announced as 
red alarms whereas less critical 
alarms are announced as yellow alarms. 
Parallel to the visible colored alarm 
string a corresponding sound is 
audible, i.e. for red alarms a red 
alarm sound is audible and for yellow 
alarms a c__yellow alarm sound is 
audible. */ 

BEGIN [ AutoCheck J 

* Verify begin 

* Alarm @ ( 1, AlarmString) within 

(@( 1, AlarmDelay) ,NaN) ; 

* sound is @ (1, sound); 

* Verify end 
END [ AutoCheck ] 

/* AutoCheck code generation. 

The code template includes ATP 
variables, which will be evaluated 
at run time. */ 

END CheckAlarm 



Discussion 

Although the current ATP integration is only the first phase) 
ATI* has ((roved lo bi- a powerful tool lor attacking and solv- 
ing complex testing problems that otherwise would not have 
been solved in I he same time frame. Like every new t ool, at 
the beginning some effort is required to learn the language. 
Also, the test engineers have lo implement a set of primitive 
functions to build a powerful function repository. Neverthe- 
less, our experience has shown that productivity increased 
significantly and that ATP helped to ensure the predictability 
of product releases. 

After a few days of use the lest engineers felt comfortable 
enough to develop their first automatic tests with ATP and 
were able lo use the function repository, 

Tests are much more sophisticated and effective than before. 
The same tests written directly in Autotest/Autol'heck code 
would have probably required three times more development 
time without reaching the same degree of reliability, flexibil- 
ity, and maintainability, The test engineers using ATP used 
the increased productivity to think about better test designs. 



Failures have been found earlier because of higher lest cover- 
age, especially from the use Of random lest data generation. 
These failures would no! have been detected in the valida- 
tion phase with the existing static test. The risk of missing a 
failure is therefore reduced by ATP. 

The redundancy of the tests is much lower. Test engineers 
are now able lo adapt their test procedures rapidly to 
changed system behavior, hi most cases they just have to 
update some constants. 

The higher abstraction level of the lest procedures enables 
the test engineer to use the same test procedures to lest 
new patient monitor products. The adaptation requires the 
substitution of some low-level primitive functions and the 
formal processors. 

Implementation 

The ATP interpreter is implemented in (' on a workstation 
running the HIM IJE* operating system. Most of the reposftorj 
tools have been written in the ATP language itself. This illus- 
trates that ATP is not only a language for formulating test 
procedures. 

The architecture follows the c lassical compiler architecture. 
The front end with lexical analysis, syntactical analysis, and 
semantic analysis is similar to other compilers for high-level 
formal languages. The back end consists of the code genera- 
lion module and the communication module, which manages 
the formal processor communication and other functions. 

< 'onclusion 

The new ATP language bridges the gap between high-level 
lest design and low-level automatic lest procedures. The 
integration of ATP into 'I 11 ' test environment has increased 
productivity and reduced redundancy. Mine important, the 
quality of the testing process has increased with the use of 
this abstract high-level programming language. Migration of 
the test procedure sel to new products is now much easier 
because most of the code can be reused. 
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An Automated Test Evaluation Tool 



The AutoCheck program fully automates the evaluation of test protocol 
files for medical patient monitors. The AutoCheck output documents that 
the evaluation has been carried out and presents the results of the 
evaluation. 

by Jorg Schwering 



The AutoChec k program extends the automated test environ- 
ment described in this journal in 1991. 1 It fully automates the 
evaluation of the test protocol files generated by the AutoTest 
program. 

Fig. 1 is a brief summary of the 1991 system. The main part 
of the figure shows the main elements of the AutoTest pro- 
grain and its environment The AutoTest program reads and 
interprets commands line-by-line out of a test script (a test 
procedure ). Basically there are three types of commands: 



• Commands to simulate user input on the monitor under test 
(keypusher) 

• Commands to control the signal simulators, which play the 
role of a critically ill patient 

• Commands to request and log output data from the monitor 
under test (reactions to keypresses and signals applied). 

The execution of all commands and the data received from 
the monitor are logged into a protocol file. Fig. 2 is an exam- 
ple of a test protocol file (one test case only). 



Tesrware 



External 




tesl 




Tesl 
Procedures 


Releronce 
Specifications 


■ 


Oesign 
Specifications 


■ 



Automated Verification 
and Evaluation 




AutoCheck 





Patient SimulRtafk Geniirale 
Physiological Signals CO; 



Monitor Output 

• Wavoi 

• Numeiics 

• Alarms 

• Inops (Technical Alertsl 

• Patient Admin 

• Monitor Status 

• Tusk Window Content 



NBP. Resp. C 0. 
SvO,, FIO; 



Fig. L Automated lest envjroitineni for medical paiieni monitnriiu; systems. AutoCheck. tin- aim untitle viTilV«ii»ii and evaluation tool 
shown ill the upper right corner, is a recent addition 

JUIM 1997 Ilcwlcll-Packaril Jousutl 103 



I Copr. 1949-1998 Hewlett-Packard Co. 



08:57 
08:57 
08:57 
08:57 
08:57 
08 i 57 
08:57 
08:58 
08:59 
08:59 
09:00 
09:00 
09:00 
09:00 
09:00 
09:00 
09:00 
09:00 



:3{5 // Teat cast; 6) 

:35 // Adjust HR alarm limits to 50-80 
:35 Merlin param 
:37 MejH5^E2~ 
:38 Merlin £7 
:39 Merlin f3 
:33 Merlin f6 
:28 Merlin f4 
:36 Merlin 

:15 // Set^ HR^ to 120 

:15 // delay 

:lfir wait2q ^> 

:35 // 

:35 // Verify that: 
:35 // HR = 120; 
:35 // alarm "HR 120 




?????????????????????????????????????????????????? 




09: 00: 35 // ???? ????????????????????????????????????????????????????????????? 
09:00:3S mecif tune HR ECG-CH1 3 



Comment 



Time Stamp 

Keypusher commands to 
set alarm limits to 50-80 



Simulator command to set 
heartrate to 120 



Additional AutoTest 
command (wait) 



Expected output (lor 
human evaluator) 



Command to gather data 



Tuned 
(? 

09:00: 
09:00: 
09:00: 
AD ATI : 

0 

09:00: 
09:00: 
09:00: 
ADAT1 : 
@ 

09:00: 
09:00: 
09:00: 
ADAT1: 
09:00: 



: HR -NU, ECG-CH1 -WS , 

36 A HR 120 > 80 

37 (Wl> II 16/ 

s^Thf 120 — 57T 780 
"Extreme Brady: "00 20 



OR 33 ECG-CH1 -WS p= ovY- 40 4 

2752 1492 mV V- u BUS-CHI 

30 /250 p= ovY- /min HR 



371.AJ'** HR 120 >_80^-'"^MrS 33 ECG-CH1 -WS p= ovY- 40 

38 (Wl) II 16/ 2752 1492 mV p = o 

38 HF 120 50 /80 30 /250 p= ovY- 

"Extreme Brady: "00 20 



4 

ECG-CH1 
/min HR 



38 A "** HR 120 > 80 

39 (Wl) II 16/ 
39\HF 120 

"Extreme Brady: " 

40 EOT 



OR 33 ECG-CH1 -WS p= ovY- 40 

2752 1492 mV p= o 

50 /80 30 /250 p=---ovY- 



Numeric value (heart ratel 



Alarm string 



ECG-CH1 
/min H: 



0 0 20 



i 



09:00:40 // 



Data blocks read 



Besides the alarm string and 
the HR numeric, much related 
data such as units or the 
alarm limits is received. Most 
ol these attributes should also 
be checked. 



Fig. 2. Ait example of a tost protocol Tile for one test case. 

Thfi Upper left block in Fig. 1 indicates how the test scripts 
at e derived from the product specifications. More informa- 
tion on lite testing process can be found in (lie article on 
page 89. A test script consists of a startup configuration 
block, which configures the monitor to a defined startup 
condition, and the test cases. Each test case consists of 
three parts: 

• The actions ( keypusher and simulator commands) 

• The description of the expected results 

• The AutoTest data request commands. 

The upper right comer in Fig. 1 shows the automatic evalua- 
tion tool, which is the subject of this article. 

Manual Evaluation 

In the test environment of Fig. 1. the test engineer had a tool 
that ran sequences of test scripts in batch mode, completely 
imattended, overnight or on weekends. With the introduction 
of AutoTest. the main effort for the test engineer shifted from 
test execution to test evaluation, which was done with the 
help of an editor. The protocol files generated by AutoTest 



(see Fig. 2) are ASCII text files that arc very often larger 
than one megabyte (some large tests have reached a size of 
more than 100 megabytes). 

The evaluation task was not only tedious and time-consuming 
but also error-prone and dependent on the knowledge, expe- 
rience, and even the mental alertness of the evaluator. As a 
consequence, the manual checks were restricted to a mini- 
mum for each test case, which typically meant that unex- 
pected attributes were not detected. Furthermore, for tests 
that needed to be repeated, the evaluation was normally 
restricted to a few (one to three) selected repetitions. 
Statistical tests, such as adjusting an alarm limit randomly 
and checking the alarm string, generate particularly large 
protocol files that are difficult to evaluate manually, leading 
the test engineer to reduce the number of test cases to a 
minimum. 
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Automated Test without AutoCheck 



r 



Automated 
Test with 
AutoCheck 




Goals for an Automatic Test Evaluation Tool 

Because of these problems, an investigation was started on 
a tool that could replace the human evaluator. The goals for 
this automatic evaluation tool were: 

• To reliev e the test engineer of tedious, lime-consuming 
manual evaluation and thereby increase efficiency 

• To avoid overlooking discrepancies 

• To gel the lest rcsulls taster by Quicker evaluation 

• To increase test coverage through side-effect checks 

• To make evaluation more objective (not lesler dependent ) 

• To allow Conditional checks l flexibility I 

■ To automate local language regression tests (see article, 
page It)!)). 

Use Model 

The basic use model (sec Fig. :l) is die replacement of manual 
evaluation with the automatic evaluation tool. The evaluation 

of the protocol file runs after the test has finished. The test 

files already contain the expected rcsulls coded in a formal 

language readable by AutoCheck 

Test execution and evaluation now consists of the following 

steps; 

1. Write the AutoTesi test script including the expected 
results in AutoCheck formal. The basic lest script layout as 
described above slays the same. The only differences are 
that some Autot 'heck definilions such as tolerances (see 
"AutoCheck Features and Syntax* below) are added to 
the startup block and that the description of the expected 

results has in follow the AutoCheck format 

i. Hun this test script through the AutoCheck syntax check 
to avoid useless AutoTest runs 

3i Execute the test script with AutoTesi as usual. The ex- 
pected results (Autot heck statements) are treated by Auto- 
Tesi as comments, which means thai they are only copied 

into the protocol file together with a time stamp. 

I. Run the protocol file through the AutoCheck evaluation 
check. Which includes a syntax check. AutoCheck generates 
a diff file reporting the deviations from the expected rcsulls 




Fig. 3. The us.- model fer Auto- 
Check replaces manual evaluation 



Evaluation check with the aiiiniii.iiii evaluation tool 

(errors) and warnings for everything thai couldn't be evalu- 
ated or is suspicious in some other way ( for details see 
"AutoCheck < HHpm" below). 

o. If and only if Aulot heck reports errors or warnings, 
check the protocol file to find out whether the deviation is 
caused by a flaw in the tesi script or a bug in the patient 
monitor under test. 

Architecture 

We first conducted a feasibility study, which investigated 
different architectural approaches and implementation loots. 
The first approach was in the area of artificial intelligence, 
namely expert systems anil language recognition (this would 

be expected for an investigation started in IfltM i h soon 

became apparent that protocol file evaluation is basically 

a compiler problem. The languages ami tools investigated 

were Prolog/I.isp. sed/CNIX shell, lex/yacc. C. and a ('-style 
macro language for a programmable editor. We came lo the 
conclusion thai ;i combination oriex/yacc and C would lead 
to the easiest and most flexible solution. 

Fig. I shows the AutoCheck architect lire. The protocol file 
is first run through a preprocessor, which removes all lines 
irrelevant to AutoCheck. identifies the different AuloTcst 
interfaces, and performs the local language translations. 
Thereafter il is analyzed In a combination of a scanner and 
a parser. We implemented specialized scanner/parsers for 
Ihe AutoCheck metalanguage and the dala provided by the 
different patienl monitor interfaces. The AuloCheck State- 
ments mid the AutoTest dala are written into separate dala 
structures, A third daia structure holds some control param- 
eters such as the accepted tolerances (see "AuloCheck 
Features and Syntax" below). After each data package, 
which is the answer to one AutoTest dala rc(|uesl command, 
the compare function is Started, The compare function 
Writes all deviations into the error file. 

Basically. Aulo'l'esl and AuloCheck recognize two types of 
data requests: sinolr limes, which respond with exactly one 
dab) sei for each requested message, and continuous tunes, 

which gather dala over a deft I lime interval. 
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Expected 
Results 





Control 
Parameters 




Fig. 4. AuloCheck architecture. 

In die monitor family under test all important data lias an 
update frequency of 1024 ms. AutoTest groups all data mes- 
sages received within a 1024-ms frame into one data block 
and AutoCheck treats each data block of a continuous tune 
like a data package of a single tune. 

All AutoCheck statements are then verified against each 
data block. The AutoCheck statements remain valid and 
are applied to the next data block until they are explicitly 
cleared or overwritten by a new AutoCheck block. 

AutoCheck Features and Syntax 

The AutoCheck features and syntax are best described using 
the example shown in Fig. 5. The numbers below corre- 
spond to Ihe numbers in the left column of Fig. 5. 

L All AutoCheck statements are preceded by a caret ( A ) and 
are treated as comments by AutoTest. As mentioned above 
under "Architecture," AutoCheck statements are grouped 
into blocks. Each block is enclosed by the two statements 
Verify Begin and Verify End. 

2. There is a set of AutoCheck statements that enables the 
user to verify all data thai can be read by AutoTest (numerics, 
alerts, sound, wave data, task window texts, etc.). An exam- 
ple of a numerical value is temperature, including all of its 
accompanying attributes such as units ( T) and alarm limits. 
In this example the value of the temperature numeric is 
expected to be 37.0°C. 

3. Verify statements can be combined with: 

A negation, for example to check the absence of an alarm 
Timing conditions, for example to verify that an alarm delay 
is within its specified range. 



In this example it is expected (hat in the time interval from 
5 seconds to infinity (NaN) there is no alarm for blood pres- 
sure. This is a typical test case in which there was an alarm 
and the simulated measurement has beat reset between the 
alarm limits, the object being to check I hat Ihe alarm disap- 
pears within a defined time. 

4. For all numerical values (measurements), including those 
in Ihe alarm siring, a tolerance can be defined to compen- 
sate for simulator tolerances. The tolerances are defined 
outside the Verify block in an additional block. Although the 
user can change the tolerances as often as desired, they are 
typically defined once at the beginning of a test procedure 
and then used for the whole test procedure. In this example, 
all values in the range from 1% below 37.0°C to 1% above 
37.0°C (-'36.7 to 37.3°C) would be accepted as correct for die 
Tempi parameter. 

Si There are special combinations, such as a numeric value 
and an alarm string. For instance, in the monitor family 
under lesl an alarm message typically indicates I ha! Ihe 
alarm limit has been exceeded. The alarm limit is also in- 
cluded in a numeric message along with its attributes. The 
command alarm "HR" > al_max allows the tester to compare the 
alarm Until in the alarm message with the alarm limit In Ihe 
numeric message (as opposed to checking both messages 
against a fixed limit ). This feature is mainly useful for statis- 
tical tests. 

6. Simple control structures (if, and, or) can be used to define 
different expected results for conditions that are either noi 
controllable by the tesl environment or are deliberately no! 
explicitly set to expand (he test coverage. In the monitor 
family under test some sellings are dependent on the config- 
uration (e.g., patient size). The simple control structures 
allow configuration-dependent evaluation 

7. As a condition in an if statement, either flags, which have 
to be defined earlier in the test procedure, or an ordinary 
AutoCheck statement can be used. 



(4) 


A 


Tolerance Definition 


(4) 


A 


"Tempi" : 1%; 


(4) 


A 


End Tolerance Definition 


ID 


A 


Verify Begin 


(21(4) 




"Tempi" ->value = 37.0; 


(2) 


A 


"Tempi" ->unit = C; 


(3) 


A 


not alarm for "Pressl" 


(3) 


A 


within ( 5 , NaN ) ; 


(5) 


A 


alarm "HR" > al__max; 


161(7) 


A 


if Neonate 


(6) 


A 


then 


(61 




"HR"-> al_min = 30; 


(61 




end i f ; 


(61(7) 




if value of "Pat.Si2e" 


(6) 




then 


(6) 


A 


"HR"-> al_min = 15; 


(6) 


A 


endif ; 


(8) 


A 


write "Check user input 


(1) 




Verify End) 



is "Adult" 



Fig. 5. An example of expected results written in the AuloCheck 
language. The numbers at the left refer to the paragraphs in the 
article that describe these statements. 
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AutoCbeck 2.02, Jan 3 1996. 17:01:69 
Error: 3*, 42 > 'IBt"->Value isn't correct 
Error: 34 , 45 > 'HR"->Valu« isn't correct 

Error: 36 . 48 > Alan *HR* doesn't mil In Interval (S.NaN! 

Error: SS, 51 > "HR*->Oolt: 'unit' <-> 'bp«* 

Warning: 34 , 55 > cuuric "HS" doesn't exist 

Warnlna,: 33, 55 > mauric "St" doesn't exist 

Wirmng: 36, 55 > He AlantliaUts available ior "W 

Error: 34. 58 > "HR"->VaIua Isn't correct 

Error: 36, C3 > incorrect alarm: aeasurement 




Fig. 6. \n example of AutoCheck output. 

8. Auto* lu rk provides a command to write a commenl into 
the output file. This can be used to instruct the user to check 
something manually (e.g.. a user input siring). 

AutoCheck Output 

Fig. ii is an example of AuloCheck output. AutoCheck gener- 
ates Seven different output types; 

• Evaluation Error The expected data and the received data 
dont match. 

• Evaluation Warning. AutoCheck couldn't determine whether 
the data is correct (e.g., data missing). 

• AutoTcsl Error. Errors reported by AuloTe.st are mirrored 
in the Output tile Id make I hem visible In the lesi engineer, 
who only looks at the protocol file in case of reported 

errors. 

• Syntax Krror. The interpretation of the AuloCheck syntax 
failed. 

• Syntax Warning. The AuloCheck synlax could be inlerpreled. 
inn is suspected to be incomplete or wrong. 

• Data Error. The AutoTcsl data couldn't be Interpreted cor- 
rectly. This indicates either a corrupted protocol file or an 
incompatibility Of AutoCheck and AuloTesI versions. 

• Wriie. This is a user-defined output, Ii enables the user to 
mark data that should be checked manually, such its user 
iupiil at a pause statement. 

The user can choose between four different warning levels 
for the synlax and evaluation warnings and can switch 
indiv idual warnings on or off. 



The output generated by AutoCheck has the follow ing format: 

ErrorType : statementline, dataline > 
descriptive text 

Thus, both the line containing the AuloCheck Statement and 
the line containing the data arc indicated 

if the output is written into a file, each line is preceded by 
the filenamelstalementhne). This is the same formal as used by 
many compilers, 8ttd therefore the built-in macros of many 
programming editors can be used in combination with Auto- 
Check. This means thai errors can be reviewed in much the 
same way that a source file is debugged after compilation 
using an editor pointing to the source code errors. 

At the end of the evaluation. AuloCheck gives the test engi- 
neer a Quick overview of the result by providing a lablc 
showing how many Output messages of each type have been 
generated. Whereas the evaluation errors indicate bugs 
either lit the product or in the lest script, the other output 
messages indicate potential problems in the lest execution 
or evaluation process. 

The AuloCheck output documents both that the evaluation 

has been carried out and the result of the evaluation, which 
for medical products are import anl for regulatory approvals 
ami audits. 
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Platforms 

AutoCheck and Autotesl ran on different platforms. Auto- 
Test runs on a D( >s " -based PC, which is very appropriate as 
a lost controller because of I he inexpensive hardware, an 
operating system thai doesn't require dealing with tasking 
Conflicts, and the availability of interface cards (the inter- 
face to the medical network is available as a PC card only). 
AuloCheck runs on a I'NIX-based workstation because of 
the av ailability of lex/yacc and the greater resources ( memory 
and processing power). However, both tools work on the 
same file system (a UNIX-based LAN server). The user 
doesn't have to worry about the different file formats, 
because AuloCheck automatically takes care of the format 
conversions. It accepts both [K)S and UNIX formats and 
generates the output according to the delected protocol Hie 
format. Having different machines for execution and evalua- 
tion has also not proved to be a disadvantage for I he tesl 
engineer. 

Expandability 

The basic architecture of AuloCheck has proven lo be flex- 
ible for enhancements over time. Since the first release of 
AuloCheck we have implemented many enhancements 
because of new product features and because AuloTesl 
provides additional data structures. 

Validation 

The risk of the AuloCheck approach is that, if AuloCheck 
overlooks an error ( false negative output ). the tester won't 
find the error. An automatic evaluation tool is only useful if 
the tester can rely on it. since otherwise, even if no errors 
were reported, the tester would still have lo look al the pro- 
tocol file. Therefore, the validation of an automatic evalua- 
tion tool is crucial to the success of such a lool. For this 
reason a thorough tesl of the tool was designed and every 
new revision is regression tested. Changes and enhancements 
undergo a formal process similar lo thai used for customer 
products. 



Results 

The manual evaluation lime for an overnight test of around 
one lo I wo hours has been reduced by the use of AuloCheck 
lo less than a minute. This means that the additional effort 
for the lest engineer for writing the expected results in the 
AuloCheck syntax is compensated alter three to five lesl 
runs. This depends on lite experience of the test engineer 

wiiii AuloCheck (the normal learning curve) and the nature 

of the test. 

A positive side effect is thai it is much easier for another 
test engineer lo evaluate the lest. 

AuloCheck also leads lo bigger tests with an increased num- 
ber of checks for each lesl case, such as checks for side 
effects. Such an automatic evaluation tool is also a prerequi- 
site for statistical testing. It would lake too much lime lo 
evaluate all these test cases manually. In oilier words. Aulo- 
Check leads lo higher lesl coverage with lower effort for Ihc 
test engineer. 

Once relieved of a great deal of the more mechanical test 
execution and evaluation activ ities, the lesl engineer iias 
time to work on new and belter lesl approaches or possibili- 
ties for an increased automation level. Over lime this has led 
to enhancements of both AtiloTest and AuloCheck and lo 
new tools like ATP (see article, page !l. r >). 
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Effective Testing of Localized 
Software 



Testing localized software is a complex and time-consuming task. With 
the help of the testing tools developed for HP patient monitors, local 
language validation for these products is fully automated. 

by Evangeios Nikolaropoulos. Jorg Schwering, and Andreas Pirrung 



localization plays a way important role in the successful 
marketing of software all over the world. For medical de- 
vices iIhtc are legal requirements io provide instruments 
and accompanying documentation in the language of the 
healthcare personnel who use them (as is I lie case in ihe 
European I'liion). It is often forgotten that localized soft- 
ware is diffcrrnt software from the original (most probably 
in English) thai was used for system integration and final 
validation. Localized software undergoes a proper integra- 
tion cycle (integration of software and translated strings) 
and must be v alidated separately. The complexity of this 
validation is obvious if one considers Ihe efforts required 
to check all error conditions and the corresponding error 
messages (and to understand them) for software in every 
language where the product is marketed. 

The moat common errors in localized software, assuming 
thai Ihe translation is done by a professional translator for 
this language and is correct, are: 

• Missing strings (empty messages, pans of screen text 

missing, menu selection items missing) 

• si rings with wrong attributes (maximum lengih exceeded— 
a possible crash cause) or strange characters filling up Ihe 

remainder of a Beld 

• Wrong si rings (not reflecting the intentions of (he author 

for this particular context) 



• Various misspellings or violations of grammar rules applied 
to the language produced through the combination Of trans- 
lated strings by Ihe soflware 

• Strings not properly cleared in a text field before a new 
siring is displayed. 

Local language testing in our laboratory is composed of two 
Steps: Ihe verification of Ihe translation and Ihe validation 
(regression testing) of the localized software. 

To verify the translation, a translator goes over all possible 
screens, messages, help texts, printouts, and so on to check 
for translation errors. The difficulty here is that in most 
cases the translator is not a frequent user of I he device 
under lest, and needs assistance in operating ihe medical 
instrumenl and generating all possible string combinations. 

The aim of validation (regression testing) of the localized 
software is (0 prove dial Ihe localization has nol negatively 
affected Ihe functionality and performance of the instru- 
ment Additional attention must be paid to typical localiza- 
tion errors (overflows or garbage generation). 

Automated Local Language Validation 

W illi the help of Ihe lesting tools developed for our patient 
monitors (see Fig. I and Ihe accompanying art ides in I his 
issue), local language validation is a fully automated process: 



Test Procedure 
lEnglishl 



localised 
Sottwore 



Softkoy Table 
(Menu Seleclioittl 



Test Procedure 
lEnglish plus 
Localized Softkeys) 




Text String Tables 
(English And Local) 





Test Results 




with localized 




Strings 




t-'itf Li Local language validation 
process fur patient monitors, 
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(a) Extract of a Test Procedure for Blood Pressure as it Is Used for Tests of English Software 



* alarm suspended -> alarms not suspended 
merlin mainscrn 

mecif skey -kalarmvol 
merlin "SwitchOnAlarms" 

* ======================================== 

* Verify begin 

A INOP "NBP EQUIP MALF" ; 

* Sound is hard inop ; 
» Verify end 



Press a hardkey 

Make a selection from a menu 

Verify statement (AutoCheck) 



A Verify begin 
A twprompt is 
A Verify end 



'Problems with the pneumatic are detected" 



Verify statement (AutoCheckl 



(bl The Procedure after Translation of Softkeys (Menu Selections) to Finnish 



* alarm suspended -> alarms not suspended 
merlin mainscrn 

mecif skey -kalarmvol 
merlin "HalytPaalle" 

* ======================================== 

A Verify begin 

" INOP "NBP EQUIP MALF" ; 

* Sound is hard inop ; 
" Verify end 



Press a hardkey (no translation) 
Make a selection from a menu 

(translated) 
Verify statement (AutoCheck) 



A Verify begin 

A twprompt is "Problems with the pneumatic are detected" ; Verify statement (AutoCheck) 

A Verify end 

* == = = = = = = ;;;: --- = ----3--- = ------- = _-- = = - = ;: = - = = = - :: = = ;: = = --- 

(c) Protocol File after Test Run with Software in Finnish 



23:35:14 * alarm suspended -> alarms not suspended 
23:35:14 merlin mainscrn 
23:35:17 mecif skey -kalarmvol 
23:35:21 merlin "HalytPaalle" 



23:36:11 * ====================================================== 

23:36:11 A Verify begin 
23:36:11 A INOP "NBP EQUIP MALF" ; 
23:36:11 A Sound is hard inop ; 
23:36:11 A Verify end 

23:36:11 * ====================================================== 

Filter: NBP -NU. @ 

23:36:13 (hard inop sound ARec : CS) 

23:36:13 I "NBP LAITEVIRHE "0-1 NBP -NU p= o--H String translated 



23:36:23 * ====================================================== 

23:36:23 A Verify begin 

23:36:23 A twprompt is "Problems with the pneumatic are detected" ; 
23:36:23 A Verify end 

23:36:23 * ====================================================== 0 

Tuned : TWPROMPT, 
23:36:26 F " NBP " 8, 15 

23:36:26 P "NBP last calibration done 16 KES 94 15:32 " W 3 T & 
23:36:29 P "Pneumatiikassa on havaittu ongelmia " W 3 T 
23:36:31 EOT 



Combined string only partly 

translated 
String translated 



Fig 2. Example of Uip sti'ps of I lie- local language 1 test process. 
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1- Translation tables called local lanyutiye lablrs are prepared 
from the native language support database containing all the 
English strings and their corresponding translations. 

2. A tesl package, a subset of the test procedures designed 
for the regression testing of the original English version, is 
compiled 

3. A copy of each test procedure out of t ins package is trans- 
lated into the local language by a tool developed by software 
quality engineering called skey. The intention here is to re- 
place in the test procedures the selec tion menu items (soft- 
keys) with the corresponding localized terms. Of course, a 

I est can be executed by passing the position of a selection 
item (e.g.. "press the second selection on the third menu") 
to the test execution tool, but this approach has proved to 
be ineffective. The issue here is not to lest that Hie "second 
selection of the third menu" works, because this was already 
tested in English, but to prove thai the "second selection of 
the third menu" is translated correctly, ;uid if selected, pro- 
duces exactly the same behavior as its English counterpart 
By calling the selections by their values and not by their 
positions we achieve higher test coverage and we tesl func- 
tionality and translation al the same time. Another argument 
for this approach is that the "second selection on the third 
menu" may be configuration dependent (even local configu- 
ration dependent ) and therefore not accessible by a position 
dependent lest (e.g.. it is still on the third menu but in the 
sixth place). Thus, calls by value make test procedures more 
robust. 

1. The translated test procedure is passed to AuloTest ' and 
is run on the localized software. The results are saved in 
protocol files containing English verify statements (the 
expected results), localized soft keys (selections), and local- 
ized actual results (see the example in Fig. 2). 



"). The protocol files are submitted to Autot'heck (see 
article, page 10:5). First, the Autot'heck preprocessor lakes 
over the task of translating the verify statements. It uses the 
local language tables to replace the English text in the verify 
clauses with the localized text. On the second pass these 
translated expected results are compared with the localized 
actual results. Discrepancies are reported in the normal way 
but with localized content. 

A special solution is also provided for Asian languages 
(simplified and traditional Chinese and Japanese), which 
use 16-bit codes. For these languages the hexadecimal 
equivalent for each character is used in the test procedures 
instead of the "drawn" character. This enables us to keep 
such characters in ASCII files (like the lest procedures and 
the protocol files) and use them with the test execution and 
evaluation tools. 

The automated local language validation has dramatically 
improved the process of localized software release. It has 
reduced the effort for local language testing for a new 
patient monitor release from twelve to four weeks and has 
significantly increased the test coverage compared with the 
traditional manual testing approach. 
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Montana in the music department teaching electronic 
music and in the computer science department he 
developed a noninvasive, continuous-wave blood 
pressure monitor In his tree time, Charles pursues his 
interest m primitive living skills and has taken several 
trips, including one to the Utah desert, where he sup- 
plemented his diet with grasshoppers and another to 
Northern Alberta, wheie he slept in a snow cave in 
40-degiee temperatures 

Michael J. Greenside 

Mike Greenside is a me- 
chanical engineer in the 
pioduct design group at HP's 
Enterprise Systems Division 
and was the lead engineer 
fnr the HP 9000 D-class 
server He is currently re- 
sponsible for future high-end 
server product definition and 
design He received a BS degree in mechanical engi- 
neering and material science in 1981 from the Uni- 
versity of California at Davis He has been at HP loi 






fifteen years and some of his tavorne projects include 
designing the peripheral bay for the HP 9000 K-class 
server, redesigning the mechanical assembly for the 
HP Windows Client PC, ana designing the 1/0 expan- 
sion enclosure for the HP 9000 T500 He is profession- 
ally interested in design lot manufaciurability Bom m 
Spokane. Washington. Mike >s married and has two 
sons Golf and volleyball are his favorite sports 

Alise Sandoval 

Aiisa Sandoval is a mechani- 
cal engineer in the product 
design group at HP's Enter- 
prise Systems Division She 
recently worked as a ther- 
mal lead engineer on the 
product definition tor the 
HP 9000 D-class server and 
is responsible for future 
high-end server definition and design She is profes- 
sionally interested in plastics design, thermal design 
and heat transfer, and project management In her 
thirteen years in design and manufacturing at HP, two 
of her favorite projects include codesigning and stan- 
dardizing HP's EIA rack family and codesigning the 
mechanical assembly for the HP Windows Client PC. 
She received a BS degree in mechanical engineerimj 
from the University of Reno at Nevada in 1982 After 
graduating she worked at Becton Dickinson for a year 
in R&D and plastics design Alisa was born in Walnut 
Creek, California and has three children She is actively 
involved in the visiting scientist school program and 
in Little League baseball Her hobbies include motor- 
cycles, camping, baseball, and swimming. She also 
enioys arts and crafts 
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Scon P. Allan 



^^^■■■^■H on Allan recently worked 
I as a systems architect on 
^j^^^^^^H the B-class workstations at 

■ ^ "^!^B HP's Workstation Systems 
\T Division He also defined a 

L if clocking strategy and de- 

~^^^m signed an ASIC to control 
^^j^jH 'dependent 
-A hardware interface. He is 

currently responsible for the memory control architec- 
ture for the next-generation workstations Scon re- 
ceived a BSEE degree, specializing in computer sci- 
ence, from the University of Colorado in 1982 The 
next year he |omed HP's Desktop Computer Division 
and spent three years working in manufacturing on 
HP 9000 Series 200 desktop computer products He 
has spent the last eleven years doing primarily ASIC 
design and system architecture on HP's workstation 
products, including designing all, or portions of, six 
ASICs ranging in functionality from the memory con- 
troller and ECC chip for the HP 9000 Series 300 and 
400 families of workstations, to a bus bridge for VME 
embedded workstations and a serial port megacell 
used in more than ten ASICs at HP Scon is currently 
working on an MSEE degree from Stanford University 
and hopes to graduate in June 1998 Scott is married 
and has two stepdaughters He has spent his life 
along the Colorado front range enjoying outdoor acti- 
vities such as skiing, road and mountain biking, and 
tnathlons. He also scuba dives anywhere it's warm 



Bruce P Bergmann 

^HB 

** • 

m 

I plane and the fast -wide 

V" H B-class workstatit 

now the system architect 
and system Ward designer for a two-way SMP (sym- 
metrical multiprocessor! workstation He earned a 
BSEE degree in 1375 from the Case Institute of Tech- 
nology and an MS degree in electrical engineering 
and applied physics in 1976 from Case Western 
Reserve University After graduating he joined HPs 
Calculator Products Division He wonted as an electri- 
cal design engineer for many of the HP 9000 Series 
300, 400, and 700 workstations. His favorite projects 
include, designing a graphics controller chip for the 
HP 9000 Models 310 and 320 workstations, designing 
a DMA control chip for the Models 330, 350. and 
follow-ons. and designing the CPU board for the 
Model 382. Bruce is married and has two children 
He has been a certified aerobics instructor for seven 
years and enjoys walleye fishing as a master angler, 
fine woodworking, and downhill skiing. 

Ronald P. Dean 

Ron Dean is a development 
engineer at HP's Workstation 
Systems Division and 
recently worked on the 
mechanical definition and 
design of the B-class work- 
station's enclosure, main 
tray, and power supply. He 
is named as an inventor in 
three pending patents on the heatsmk design, card 
guide, and alignment mechanism He is currently 
working on upgrades to the B Series and J Series 
workstations. He earned a BSME degree in 1977 and 
then joined HP's Calculator Products Division He 
worked on the HP 9000 Series 300 doing mechanical 
product design, including the case parts and cooling 
subsystems, and published an article about his work 
in the HP Journal He then went on to work on the HP 
9000 Series 500 workstations and was responsible 
for the overall interconnection strategy, case parts, 
cable, and boards He also contributed to the HP 9000 
Series 700 industrial computers, developing case parts, 
boards, interconnect, card cages, and backplanes. 
Born in Dearborn. Michigan. Ron is married and has 
four children He is a licensed engineer and in his 
free time enjoys cross-country skiing, woodworking, 
and bridge 
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Dianne Jiang 

An R&D engineer ai HP's 
Workstation Systems Divi- 
sion, Dianne Jiang worked 
on the processor and system 
verification for the HP 9000 
B-class workstation. Born in 
Jilin. China, Dianne received 
a BS degree in 1984 and an 
MS degree in 1987, both in 
solid state physics from Nankai University in China. 
She went on to earn a MSEE degree in 1995 from 
Texas A&M University, where she worked as a re- 
search assistant in the electrical engineering depart- 
ment on a semiconductor laser signal processing sys- 
tem and in the physics department doing research on 
electronic transport of high-temperature supercon- 
ducting materials. She joined HP in 1995 and one of 
her favorite projects since then includes designing an 
ASIC turn-on board for the HP B-class workstation 
Dianne is married and has a son. In her free time she 
enjoys reading, music, walking, and family activities. 

Dennis L Floyd 

A hardware design engineer 
at HP's Workstation Systems 
Division, Dennis Floyd was 
recently responsible for the 
design of the system board 
for the B-class workstation. 
He is currently responsible 
for verifying a memory and 
I/O controller ASIC design. 
Dennis joined HP in 1 988 after earning an MSEE 
degree from the University of Minnesota. He spent 
the first five years at HP's Computer Manufacturing 
Division, introducing workstations and industrial con- 
trollers into the production process. He then spent 
two years working on the test and verification of 
VME-based embedded controllers. Born in Danville, 
Kentucky, he received a BSEE degree from the 
University of Kentucky in 1 986. Dennis is married and 
enjoys outdoor sports such as skiing, bicycling, and 
hiking. 
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Evangelos Nikolaropoulos 

A software quality engineer 
at HP's Patient Monitoring 
Division, Evangelos Nikola- 
ropoulos was the software 
quality lead for the latest 
patient monitor in the HP 
OmniCare family He is pro- 
fessionally interested in 
quality systems, verification 
and validation methods, and product generation pro- 
cesses. Evangelos joined HP in 1986 and initially 
worked on the design and implementation of order 
processing systems, then began doing software qual- 
ity engineering for medical products. He is now re- 
sponsible for software quality assurance and pro- 
vides guidance for product development. He earned a 
master's degree in economics from the University of 
Athens in 1976 and a Diploma in computer science 
and operations research in 1981 from the University 
of Fribourg in Switzerland. After graduating he worked 
as a research assistant at the University doing statis- 
tical modeling. He then worked on logistic systems 
for the Greek Army, and later designed databases for 
the National Hellenic Research Foundation in Athens. 
Greece Evangelos was born in Athens. In his free 
time he likes to read and learn foreign languages. 
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Andreas Pirrung 

Andreas Pirrung is an R8D 
engineer at HP's Patient 
Monitoring Oivision and is 
working on the design and 
implementation of the upper 
layer protocol software for 
LAN communication in an 
HP patient monitor He is 
professionally interested in 
artificial intelligence, machine learning, and software 
engineering. Andreas was born in Neustadt/Wein 




strasse, Germany and received a Diploma in com- 
puter science from the University of Karlsruhe. He 
joined HP in 1993 as a software quality engineer. In 
his free time, Andreas enioys swimming, paragliding, 
and reading. 
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Jorg Schwering 

a A software quality engineer 
at HP's Patient Monitoring 
Division since 1991, Jorg 
Schwering develops and 
provides consultations on 
product generation guide- 
lines. He is professionally 
interested in product gen- 
^^^^^^ m eration processes and soft- 
ware testing techniques. He joined HP in 1988 and 
worked half-time while attending college until he 
received a Diploma in computer science in 1991 from 
Berufsakademie in Stuttgart, Germany. Jfjrg was born 
in Steinfurt, Germany, is married, and has an infant 
son. 
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Evangelos Nikolaropoulos 

Author's biography appears elsewhere in this section 

Jorg Schwering 

Author's biography appears elsewhere in this section. 
Andreas Pirrung 

Author's biography appears elsewhere in this section. 
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